Data-Driven Prediction of Surgery Times from Clinical and Operational Features
17.09.2025, Abschlussarbeiten, Bachelor- und Masterarbeiten
<section id="thesis-description">
<h2>Background</h2>
<p>Efficient planning and scheduling in hospitals strongly depends on accurate estimates of surgery durations. However, actual surgery times often vary due to patient characteristics, surgical procedures, and operational factors, leading to uncertainty in operating room planning. Improving duration estimates can help reduce delays, optimize resource allocation, and improve patient care. With access to a rich hospital dataset, this thesis project will focus on developing data-driven methods to estimate surgery durations based on relevant features (e.g., procedure type, surgeon features, patient characteristics).</p>
<h2>Task</h2>
<p>The objective of this thesis is to develop and evaluate predictive models for estimating surgery durations based on clinical and operational features in the hospital dataset. The student will engage in data preprocessing, feature engineering, and the application of statistical and machine learning techniques to model the relationship between surgery characteristics and duration. Model performance will be rigorously evaluated, and insights into feature importance will be analyzed. In addition, the student will design a reusable prediction pipeline that can be applied to new data, enabling ongoing estimation of surgery durations. The expected outcome is a validated methodology and a comprehensive list of surgeries with their estimated durations, providing both practical support for hospital scheduling and a contribution to data-driven healthcare operations research.</p>
<h2>Subtasks</h2>
<ul>
<li>Review literature on surgery duration estimation and predictive modeling in healthcare.</li>
<li>Explore and preprocess the dataset, handling missing values and standardizing formats.</li>
<li>Identify relevant features affecting surgery durations (e.g., procedure type, patient characteristics, surgeon, time of day).</li>
<li>Transform categorical and numerical variables appropriately and assess feature importance.</li>
<li>Implement statistical and machine learning models such as linear regression, random forest, or gradient boosting.</li>
<li>Train models on historical data and tune hyperparameters for optimal performance.</li>
<li>Evaluate model accuracy using metrics like RMSE or MAE and perform cross-validation.</li>
<li>Compare different models and select the best-performing approach.</li>
<li>Develop a reusable prediction pipeline that can be applied to new data.</li>
<li>Produce a list of surgeries with their estimated durations.</li>
<li>Summarize findings, insights, and recommendations for hospital scheduling.</li>
</ul>
</section>
Kontakt: sidra.rashid@tum.de