Python Data Handling Tutorial – Learn NumPy & Pandas for Beginners


Learn Python data handling with this complete tutorial covering NumPy arrays, Pandas DataFrames, data manipulation, and best practices. Ideal for beginners and developers working on AI, data science, and analytics projects.

1. Introduction to Data Handling in Python

Data handling is crucial for processing, analyzing, and visualizing data. Python provides NumPy and Pandas libraries for efficient data manipulation.

Best Practices:

  1. Always import libraries with standard aliases (import numpy as np, import pandas as pd).
  2. Keep data clean and structured.
  3. Comment your code for clarity.

2. NumPy – Numerical Python

NumPy is a library for handling numerical data efficiently with arrays and matrix operations.

Installation


pip install numpy

Basics


import numpy as np

# Create a 1D array
arr = np.array([1, 2, 3, 4, 5])
print("Array:", arr)

# Create a 2D array
matrix = np.array([[1, 2], [3, 4]])
print("Matrix:\n", matrix)

# Array operations
print("Sum:", arr + 5)
print("Square:", arr ** 2)

Key Features:

  1. Multi-dimensional arrays (ndarray)
  2. Vectorized operations (faster than Python lists)
  3. Mathematical functions (sum, mean, sqrt)

Best Practices:

  1. Use vectorized operations instead of loops for speed.
  2. Keep arrays homogeneous for performance.
  3. Use descriptive variable names like data_array or matrix_scores.

3. Pandas – DataFrames & Series

Pandas is used for structured data manipulation with DataFrames (tables) and Series (columns).

Installation


pip install pandas

Basics


import pandas as pd

# Create a Series
s = pd.Series([10, 20, 30, 40])
print("Series:\n", s)

# Create a DataFrame
data = {
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 35],
"City": ["Delhi", "Mumbai", "Bangalore"]
}
df = pd.DataFrame(data)
print("DataFrame:\n", df)

DataFrame Operations


# Access columns
print(df["Name"])

# Filter rows
print(df[df["Age"] > 28])

# Add new column
df["Salary"] = [50000, 60000, 70000]
print(df)

Key Features:

  1. Read/write CSV, Excel, JSON files (pd.read_csv, df.to_csv)
  2. Powerful data filtering and aggregation (groupby, mean, sum)
  3. Handle missing data (dropna, fillna)

Best Practices:

  1. Always inspect data using df.head(), df.info(), df.describe().
  2. Use vectorized operations instead of row-by-row loops.
  3. Keep column names meaningful and consistent.

4. Combining NumPy & Pandas

NumPy arrays can be used inside Pandas for fast computation.

Example:


import numpy as np
import pandas as pd

data = pd.DataFrame({
"Scores": [80, 90, 75, 85]
})

# Convert to NumPy array for calculations
scores_array = data["Scores"].to_numpy()
average = np.mean(scores_array)
print("Average Score:", average)

5. Summary & Best Practices

  1. NumPy: Efficient numerical computation with arrays and matrices.
  2. Pandas: Structured data analysis using Series and DataFrames.
  3. Best Practices:
  4. Use vectorized operations instead of loops.
  5. Clean and validate data before analysis.
  6. Comment your code and use descriptive variable names.

By mastering NumPy and Pandas, beginners can efficiently handle, analyze, and process data, preparing for AI, data science, and machine learning projects.