A MATLAB ML model that predicts Codeforces submission outcomes.
This MATLAB-based machine learning project classifies Codeforces submissions into three categories:
- Accepted (Correct Solutions)
- Bug (Incorrect Solutions: WA, RTE)
- Inefficiency (Performance Issues: TLE, MLE)
Using features like execution time, memory usage, and problem rating, it trains multiple classifiers—Decision Tree, Random Forest, KNN, and SVM—to predict whether a new submission is likely to be correct, buggy, or inefficient. The project utilizes cross-validation, feature scaling, and confusion matrices for evaluation, making it a powerful tool for analyzing competitive programming performance trends.
Retrieves your Codeforces submission data using the Codeforces API.
Extracts several numerical features from each submission:
PassedTestCount
: Number of test cases passed.TimeConsumedMillis
: Execution time in milliseconds.MemoryConsumedBytes
: Memory usage in bytes.RelativeTimeSeconds
: Time relative to the contest start.ProblemRating
: Difficulty rating of the problem (if available or simulated).
Each submission is labeled into one of three categories:
- Accepted (0)
- Bug (1)
- Inefficiency (2)
The dataset is balanced across the three classes if possible.
Four classifiers are trained using 5-fold cross-validation (or a hold-out split if data is limited):
- Decision Tree (with pruning)
- Random Forest (bagging with 150 cycles)
- K-Nearest Neighbors (KNN with 7 neighbors)
- Support Vector Machine (SVM, implemented via
fitcecoc
for multi-class classification)
Feature scaling (z-score normalization) is applied within each fold.
Aggregated confusion matrices are computed over all folds, and several figures are generated to visualize:
- Scatter plot of Passed Test Count vs. Time Consumed.
- Histogram of Problem Ratings.
- Confusion charts for each classifier.
- Log in to your MATLAB Online account.
- Create a new script and paste the content from
CodeforcesSubmissionClassifier.m
. - Save the file with a
.m
extension.
Run the following in the Command Window:
clear; clc;
Click the Run button in the Editor or type the script name in the Command Window:
CodeforcesSubmissionClassifier
- The script will display average accuracy for each classifier in the Command Window.
- Figures will be generated for:
- Scatter Plot of Passed Test Count vs. Time Consumed.
- Histogram of Problem Ratings.
- Aggregated Confusion Matrices for Decision Tree, Random Forest, KNN, and SVM.
Passed Test Count vs. Time Consumed (ms) Colored by Submission Category.

Distribution of Problem Ratings.

These figures present the aggregated confusion matrices (row- and column-normalized) for each classifier:




After running the script, you will see output similar to:
Model Accuracy
_________________ ________
{'Decision Tree'} 0.7891
{'Random Forest'} 0.87775
{'KNN' } 0.83945
{'SVM' } 0.83537
These results indicate the average accuracy across cross-validation folds for each model. You can further tune the hyperparameters or improve feature extraction to enhance model performance.
Consider extracting features from the actual source code (if available), such as:
- Code length
- Loop nesting depth
- Cyclomatic complexity
Use grid search or Bayesian optimization to fine-tune model parameters.
Incorporate submissions from multiple users or contests to increase data diversity.
This project is licensed under the MIT License. See the LICENSE
file for details.
- Dr. Osama Farouk: For his invaluable insights and guidance in Information Theory.
- Dr. Haidy Saeed: For her dedicated support and inspirational motivation throughout my learning journey.