NYSE-Pharma-Performance-LR-Model

Linear Regression Model for Predicting Pharmaceutical Sector Performance in New York Stock Exchange

Project Overview

This project develops a linear regression model to predict pharmaceutical sector performance using economic, market, and industry-specific indicators.

Installation
Project Structure
Outline
Usage
Data
Model
Results
License
Contact

Installation

git clone https://github.com/wusinyee/NYSE-Pharma-Performance-LR-Model.git
cd NYSE-Pharma-Performance-LR-Model
pip install -r requirements.txt

Project Structure

NYSE-Pharma-Performance-LR-Model/
│
├── data/
│   ├── raw/
│   │   └── .gitkeep
│   └── processed/
│       └── .gitkeep
│
├── notebooks/
│   ├── 1.0-data-preprocessing.ipynb
│   ├── 2.0-exploratory-data-analysis.ipynb
│   └── 3.0-model-development.ipynb
│
├── src/
│   ├── data/
│   │   ├── __init__.py
│   │   └── preprocess.py
│   ├── features/
│   │   ├── __init__.py
│   │   └── build_features.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── train_model.py
│   │   └── predict_model.py
│   └── visualization/
│       ├── __init__.py
│       └── visualize.py
│
├── tests/
│   ├── __init__.py
│   ├── test_data.py
│   ├── test_features.py
│   └── test_models.py
│
├── .gitignore
├── LICENSE
├── README.md
├── requirements.txt
└── setup.py

This structure follows best practices for organizing a data science project:

data/: Stores raw and processed data files. notebooks/: Contains Jupyter notebooks for exploration and analysis. src/: Houses the main source code of the project. tests/: Includes unit tests for different components. Root directory files for project setup and documentation.

Outline

New York Stock Exchange Pharmaceutical Performance Linear Regression Project Outline

Data Collection and Preparation a. Stock Data Collection
- NYSE historical dataset from Kaggle
- S&P 500 index data
- API-fetched pharmaceutical company data b. Economic Data Collection c. Healthcare Data Collection d. Market Sentiment Data Collection e. Data Preprocessing f. Data Integration g. Data Quality Checks h. Feature Engineering i. Data Documentation
Exploratory Data Analysis (EDA) a. Analyze variable distributions b. Investigate correlations c. Examine time series characteristics d. Visualize key relationships
Feature Selection a. Statistical methods (correlation, VIF, mutual information) b. Domain knowledge application
Model Development a. Data splitting (train, validation, test) b. Baseline model implementation c. Advanced model development
- Linear models (Ridge, Lasso)
- Tree-based models (Random Forest, Gradient Boosting)
- Support Vector Regression
- Neural Networks d. Cross-validation
Model Optimization a. Hyperparameter tuning b. Ensemble methods exploration
Model Evaluation and Selection a. Performance metric comparison b. Model interpretability assessment c. Final model selection
Model Interpretation a. Feature importance analysis b. SHAP value analysis
Model Validation a. Test set evaluation b. Backtesting c. Sensitivity analysis
Deployment Planning a. Deployment system design b. Infrastructure setup c. Prediction pipeline development
Documentation and Reporting a. Technical documentation b. Final report preparation c. Visualization creation
Stakeholder Presentation a. Presentation preparation b. Key findings and results communication
Model Deployment a. Implementation of deployment system b. Testing and quality assurance
Monitoring and Maintenance a. Performance monitoring setup b. Retraining schedule establishment c. Version control implementation
Compliance and Ethics a. Regulatory compliance review b. Fairness and bias assessment c. Ethical use guidelines development
Knowledge Transfer a. User guide creation b. Training session conduction c. Support system setup
Impact Assessment a. Model impact measurement b. Efficiency gains quantification c. Stakeholder feedback collection
Iterative Improvement a. Regular performance reviews b. Continuous improvement implementation
Scaling and Expansion a. Scalability assessment b. Expansion roadmap development
Project Closure a. Comprehensive project review b. Lessons learned documentation c. Formal project closure

Usage

Run data preprocessing: python src/data/preprocess.py
Perform EDA: jupyter notebook notebooks/2.0-exploratory-data-analysis.ipynb
Train the model: python src/models/train_model.py
Make predictions: python src/models/predict_model.py

Data

Data sources: NYSE, FDA, U.S. Bureau of Economic Analysis
Features: stock prices, economic indicators, FDA approvals
Target variable: Pharmaceutical sector daily returns

Model

Algorithm: Linear Regression
Key features: [List top 5 features]
Performance metrics: R-squared, MAE, RMSE

Results

[Brief summary of model performance and key insights]

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

[Mandy Wu] - [[email protected]]

Project Link: https://github.com/wusinyee/NYSE-Pharma-Performance-LR-Model

wusinyee / nyse-pharma-performance-lr-model Goto Github PK