Python
  • Intro.
  • Catalogue
  • Chapter 1: Introduction to Python
  • Chapter 2: Python Syntax and Fundamentals
    • Chapter: Variables and Data Types in Python
  • Chapter 3: Control Flow
  • Chapter 4: Functions
  • Chapter 5: Data Structures
  • Chapter 6: Object-Oriented Programming (OOP)
  • Chapter 7: Modules and Packages
  • Chapter 8: File Handling
  • Chapter 9: Error and Exception Handling
  • Chapter 10: Working with Databases
  • Chapter 11: Iterators and Generators
  • Chapter 12: Decorators and Context Managers
  • Chapter 13: Concurrency and Parallelism
  • Chapter 14: Testing and Debugging
  • Chapter 15: Web Development with Python
  • Chapter 16: Data Science and Machine Learning with Python
  • Chapter 17: Working with APIs
  • Chapter 18: Automation with Python
  • Chapter 19: Python and Cloud/DevOps
  • Chapter 20: Python and IoT
  • Appendices
Powered by GitBook
On this page

Chapter 16: Data Science and Machine Learning with Python

Data science and machine learning are transformative fields where Python excels due to its extensive libraries and frameworks. This chapter introduces core libraries like numpy, pandas, and scikit-learn to process data, analyze it, and build predictive models.


Data Science with Python

Numpy: Numerical Computing

numpy is a library for efficient numerical computations, particularly with large datasets.

Key Features:

  • Multi-dimensional arrays (ndarray).

  • Mathematical operations.

Example:

import numpy as np

# Create an array
array = np.array([1, 2, 3, 4])

# Perform operations
print(array + 10)  # Output: [11, 12, 13, 14]
print(np.mean(array))  # Output: 2.5

Pandas: Data Manipulation

pandas is a library for working with structured data using DataFrames.

Key Features:

  • Reading and writing data files (CSV, Excel).

  • Data cleaning and transformation.

Example:

import pandas as pd

# Create a DataFrame
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)

# Access data
print(df['Name'])  # Output: Series of names

# Filter data
filtered = df[df['Age'] > 25]
print(filtered)

Matplotlib and Seaborn: Data Visualization

  • matplotlib: Basic plotting.

  • seaborn: Advanced statistical visualizations.

Example:

import matplotlib.pyplot as plt
import seaborn as sns

# Plot with matplotlib
plt.plot([1, 2, 3], [4, 5, 6])
plt.title("Basic Plot")
plt.show()

# Plot with seaborn
sns.barplot(x=['A', 'B', 'C'], y=[4, 5, 6])
plt.show()

Machine Learning with Python

Scikit-Learn: Core ML Library

scikit-learn is a comprehensive library for implementing machine learning models.

Steps in Machine Learning:

  1. Load Data:

    from sklearn.datasets import load_iris
    
    data = load_iris()
    print(data['feature_names'])
  2. Preprocess Data:

    from sklearn.model_selection import train_test_split
    
    X_train, X_test, y_train, y_test = train_test_split(data['data'], data['target'], test_size=0.2)
  3. Build Model:

    from sklearn.ensemble import RandomForestClassifier
    
    model = RandomForestClassifier()
    model.fit(X_train, y_train)
  4. Evaluate Model:

    from sklearn.metrics import accuracy_score
    
    predictions = model.predict(X_test)
    print(accuracy_score(y_test, predictions))

Hands-On Exercises

Exercise 1: Analyze a CSV File

Load a CSV file with pandas and display basic statistics.

Solution:

df = pd.read_csv('data.csv')
print(df.describe())

Exercise 2: Train a Linear Regression Model

Use scikit-learn to predict housing prices.

Solution:

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)
print(model.coef_)

Exercise 3: Visualize Data

Plot a histogram of ages using matplotlib.

Solution:

plt.hist(df['Age'], bins=10)
plt.title("Age Distribution")
plt.show()

Best Practices

  1. Data Cleaning: Handle missing values and outliers before analysis.

  2. Feature Scaling: Normalize or standardize features for better model performance.

  3. Validation: Use cross-validation to assess model performance.

  4. Documentation: Annotate data analysis steps for reproducibility.

In the next chapter, we will dive deeper into working with APIs, including consuming and building REST APIs with Python.

PreviousChapter 15: Web Development with PythonNextChapter 17: Working with APIs

Last updated 5 months ago