31 We selected four clinical variables age (years)

The final diagnosis was divided into five tumour types: benign, borderline, stage I invasive, stage II IV invasive, and secondary metastatic cancer.Data were entered through dedicated and secure data collection systems, web based for phase 1, and through a local study screen (Astraia software, Munich, Germany) for later phases.21 22 23 To ensure data integrity, several clinicians and statisticians used built in automatic checks and manual review and cleaning of data.Statistical analysisWe developed a prediction model using data from the women included in IOTA phases 1, 1b, and 2 (n=3506) and validated the model on data from the women included in phase 3 (n=2403).The serum CA 125 tumour marker was not a mandatory variable, and measurements were missing in 31% of the patients. As described in detail in supplementary appendix A, we used multiple imputation to deal with missing values for CA 125.28 We created 100 imputations, resulting in 100 completed datasets.We selected variables in two stages (see supplementary appendix B for details). Firstly, to avoid over fitting we reduced the number of potential predictors to 10 based on subject matter knowledge29 30 and the stability of the predictors over centres.31 We selected four clinical variables age (years), serum CA 125 level (U/mL), family history of ovarian cancer (yes/no), and type of centre (oncology centre v other hospitals), and six ultrasound variables the maximum diameter of the lesion (mm), proportion of solid tissue (that is, the maximum diameter of the largest solid component divided by the maximum diameter of the lesion), presence of more than 10 cyst locules (yes/no), number of papillary projections (0, 1, 2, 3, >3), presence of acoustic shadows (yes/no), and presence of ascites (yes/no).

