Risk prediction models for colorectal cancer : A scoping review

Background and objectives Many risk prediction models have been developed globally to identify specific populations at high risk for colorectal cancer in specific settings. Documentation of available evidence from existing studies will serve as a useful information base. We performed a scoping review, to review and analyse published risk prediction models for colorectal cancer the world over. Methods A scoping review was undertaken to address the following question ‘what are the existing risk prediction models to identify the risk of developing colorectal cancer among individuals in different countries and settings?’ using the framework developed by Arksey and O’Malley for scoping reviews. Forty-one articles were included in this review from database searches and from additional searches. The titles and abstracts were reviewed using predetermined screening criteria. We limited our search to existing literature in English language and included both observational and interventional studies. Results Out of the 58 risk prediction models identified, most were developed for colorectal cancer followed by advanced colorectal cancer. Most of the articles reviewed were cross sectional studies or cohort studies. Statistical methods such as multiple logistic regression was used by a majority, while few have incorporated non-statistical methods such as consensus method and extracting data from published literature. The authors of the 58 risk prediction models have considered 77 different risk factors excluding the genetic variants. Conclusions This comprehensive scoping review demonstrates the capacity of the existing risk models to stratify the general population into risk categories, detailing the studies conducted, location, study design, outcome, overview of the methods, data source and the identified risk predictors. While striving to build on existing knowledge, the review also identifies the research gaps and the need for further improvement. Corresponding Author: Yasara Manori Samarakoon"< yasara.samarakoon@gmail.com> https://orcid.org/0000-0001-6146-9767 Received: October 2017, Accepted revised version December 2017, Published January 2018 Competing Interests: Authors have declared that no competing interests exist This is an open-access article distributed under a Creative Commons Attribution-Share Alike 4.0 International License (CC BY-SA 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are attributed and materials are shared under the same license.


Introduction
Globally, colorectal cancer is ranked as the third most common cancer in men and as the second most common cancer in women.[1].The tests available to screen for colorectal cancers vary from simple tests, such as the faecal occult blood test to more technical and invasive methods, such as flexible sigmoidoscopy and colonoscopy which have better sensitivity and specificity than other methods [2].The lifetime risk of having colorectal cancer in a Western country is about 5% in the population [1].Thus, screening for colorectal cancer would benefit only this 5% whilst the remaining 95% might have to undergo this invasive high cost procedure with no personal gain [2].Evidence from developed countries suggests that it is more efficient to offer colorectal cancer screening using colonoscopy or flexible sigmoidoscopy to high-risk population groups rather than to all as a routine screening test [3].This has prompted many countries to explore the use of high-risk screening for colorectal cancer with appropriate risk stratification of individuals [4].
With the growing recognition of the potential harms of population-based cancer screening programs, screening based on risk stratification has been proposed as a method of reducing harm as well as a method of focusing on the risk population [5].If risk-stratified cancer prevention is to be implemented, it requires risk assessment tools that can be used in primary care to identify those most likely to benefit from this intervention [6].Of the tools to assess the individualized cancer risk, risk prediction models which are simple and can be applied in a community setting by a trained person are considered as useful [7].
Risk prediction modelling is a mechanism which estimates the probability of an individual having a certain condition based on presence of multiple risk factors [7].An essential feature of a risk prediction model is that it uses multiple predictors to assess individuals regarding their risk of future occurrence of a specific outcome [8].In the development of risk prediction models, obtaining accurate risk estimates for genetic, environmental and behavioural factors and clinical biological markers etc. becomes important.This is usually achieved via cohort or case-control studies [9].Furthermore, incorporation of variables from published data and expert opinion is another method of selecting risk predictors [10,11].
There are many risk prediction models developed in different parts of the world to identify specific populations at high risk for colorectal cancer in specific settings.Knowledge regarding the different study designs and statistical methods used is useful for researchers and service providers in the field of colorectal cancer to identify the comprehensiveness and applicability of the various models developed in different parts of the world.Thus, documentation of available evidence from existing studies may serve as a useful information base.In this background, a scoping review was performed, to review and analyse the published risk prediction models for colorectal cancer, the world over.The methodologies used were also reviewed and summarized to facilitate researchers in the field.

Methods
Scoping reviews are distinguished from systematic reviews in their focus on providing an overview of the research landscape to propose a platform for future research.It differs from a systematic review as it does not evaluate research quality or provide a synthesis or meta-analysis of findings [12].The present scoping review is conducted with the objective of identifying existing risk prediction models for colorectal cancer.In the methodology, Arksey and O'Malley's (2005) scoping review framework was used.This model comprises four key stages which includes, identifying the research question, identifying relevant studies, selecting studies, charting of data and collating, summarizing and reporting results [13].

Identifying the research question
The review focused on the research question, 'what are the existing risk prediction models to identify the risk of developing colorectal cancer among individuals in different countries and settings?'

Identifying relevant studies
The review included a search of the scientific literature via PubMed/Medline and Cochrane database.Aligned with the research question, broad search terms were used.These included 'colorectal neoplasm,' 'risk/risk factor/risk assessment,' and 'prediction/model/score.' The search filtered the articles published in peer reviewed journals between 1 st of January 2000 to 20 th May 2017.Searches were limited to papers in the English language.As guided by Arksey O' Malley's (2005) framework, all articles were screened for relevance to the research question.To capture any missed articles, including those in non-medical databases, a secondary Google Scholar search was initiated that fitted the research question.The selection process is shown in Figure 1.

Selecting studies
The titles and abstracts were reviewed using predetermined screening criteria.Inclusion and exclusion criteria are listed in    Following the identification of the relevant articles, the first author reviewed the full text of each article, confirming the relevance and reviewing overall themes.After applying the screening criteria, forty-one articles were retained for the scoping review.Once the duplicates were removed, the search identified forty-one articles for the final review.
Agreement was obtained on overall patterns and gaps.

Charting of data and collating, summarizing and reporting results
Each of the forty-one articles was charted according to the author (year), study location, study design, outcome of the model, overview of the methods and data source.Because the aim of this scoping review is to identify the existing risk prediction models and does not seek to evaluate quality, charting emphasized the basic characteristics of articles while the validation details of the models (which correspond to the quality of the models) were not reviewed.For studies which included multiple models such as separate models for men and women or for different sub sites, all were included separately.

Identified risk prediction models
Forty-one eligible articles were included in the present scoping review and they described fifty-eight risk prediction models.Among the identified models, 29 have colorectal cancer, 12 have advanced colorectal cancer (defined as either having an invasive cancer, an adenoma of 10mm or more, a villous adenoma or having an adenoma with high grade dysplasia), 11 have colon cancer and six, rectal cancer as the outcome.

Development of the models
Determination of risk factors was performed via various study designs in these risk prediction models.A total of twenty-three were developed from hospital based (n=14) or population based (n=9) cross sectional studies in participants undergoing screening colonoscopy, while another 13 models were developed from case control studies.The cases were identified from hospitals (n=6) or population registries (n=7) while controls were identified from hospitals, primary care or from the community.Sixteen models were developed using cohort designs where most of the cases were identified through cancer registries over a period of follow-up.Four risk prediction models were developed from reviewing published literature [30,31,32,49], while one model was developed using a consensus procedure [54].
The prominent method of developing the risk prediction models was via statistical methods.A majority of the studies have used multiple logistic regression to identify the risk predictors [16,27,28,30,33,34,35,41,43,47,50,52] followed by the allocation of the risk points, based on the values of beta-coefficients [14,15,18,21,23,24,36,37,39,45,46,48].Fourteen models have incorporated Cox proportional hazards regression to develop the risk model and the score [17,19,20,22,38,40].One model [30] was developed from meta-analysis of various studies, one [31] used risk modelling software in a simulated population and one [26] used pure bivariate analysis.Several other statistical methods were used in the development of models such as jackknife feature selection and ANOVA testing [32], multivariate non-linear Poisson regression [42], recursive portioning analysis [44] and classification tree analysis [53].However, four models were identified as developed from non-statistical methods such as consensus method [25,51], extracting data from previously published validated models [49] or from previous studies [54].

Risk predictors in the developed models
The risk prediction models identified can be broadly categorized as non-genetic, genetic and mixed models with both genetic and non-genetic predictors.Among the models four were purely genetic [17,32,34,47], seven models have both components [16,27,31,35,39] while the rest of the models are non-genetic (n=48).The authors of the 58 risk prediction models have considered 77 different risk factors (excluding genetic factors) as shown in Table 3.

Discussion
A comprehensive review was performed that identified 58 risk prediction models in fortyone studies.This scoping review demonstrates that multiple risk prediction models exist for predicting the risk of developing colorectal cancer, advanced colorectal cancer, colon cancer and rectal cancer among asymptomatic male and female population groups.A majority had been developed using data from analytical cross-sectional studies.The other contributions are from case-control studies and cohort studies, in this order.Though many have used multiple logistic regression statistical methods in developing the model, a minority have incorporated non-statistical methods such as consensus processes and reviewing literature.The identified models ranged from pure non-genetic models to pure genetic models including a small number of models with both components.
The main strength of this review is the extensive search strategy and careful screening of the studies applicable to the research question.Use of a broad search strategy has allowed us to identify many more risk models than reported in previous reviews in the area of risk prediction in colorectal cancer.Therefore, this review is more comprehensive and up to date.However, the inclusion criteria included only asymptomatic individuals, excluding symptomatic and already diagnosed populations, limiting the applicability of models for those with familial syndromes such as hereditary non-polyposis colorectal cancer or familial adenomatous polyposis.Furthermore, since the research question in this review was to identify existing models, the performance of the risk prediction models was not evaluated with respect to their discriminative power and calibration properties which is a drawback as the usefulness of the models in terms of validity could not be shown.
This scoping review demonstrates that the existing risk models have the capacity to stratify the general population into risk categories.Risk stratification applied through these models can help to identify the populations who may benefit from invasive screening preventing those at low risk of disease from being exposed to the direct and indirect harms of screening procedures.This may also address the issues of cost effectiveness of screening programmes using colonoscopy since this risk stratification can limit the number of individuals referred for screening.The use of risk prediction models can also increase the screening behaviour of the public as well as provide an opportunity to encourage lifestyle changes.
However, several challenges can be anticipated when implementing the existing models in clinical practice.Many require collection of dietary information using food frequency questionnaires.Though these can be used to generate accurate estimates in the research setting, practical applicability at population level is questionable.With assessment of lifetime physical activity, recall bias becomes an issue.Furthermore, information collected other than from routine medical reports, becomes questionable with regards to accuracy.On the other hand, access to medical reports also becomes a practical challenge when applying these modules at community level.Furthermore, models including genetic variants, require blood sample collection as well as processing which is not so user friendly or feasible at population level, in addition to increased cost.
It is necessary to evaluate the performance of these models with respect to their discriminative and calibration properties.Evaluation of the utility of these models in.The role of the currently available models in clinical practice will be defined when comparative data on the performance of different models becomes available.However, the choice of which risk model is applicable to each country will be based on validation studies in the population of interest.Furthermore, research is needed to identify optimal implementation strategies, where feasibility, accessibility, cost-effectiveness and impact on morbidity and mortality in comparison to the already existing programmes is assessed.Finally, it is necessary to assess the advantages and disadvantages of implementing these risk models in clinical practice via randomized controlled trials.

Conclusions
This comprehensive and up-to-date scoping review describes this emergent body of literature, detailing the studies conducted, location, study design, outcome of the model, overview of the methods, data source and risk predictors identified, demonstrating the capacity of the existing risk models to stratify the general population into risk categories.While striving to build on existing knowledge, the review also identifies the research gaps and the need for further improvement.

Figure 1 :
Figure 1: Selection process in the Scoping Review

Table 2 : The basic characteristics of the included articles on colorectal risk prediction models Authors (year) Study location Study design Outcome Overview of methods Factors Identified Data Source
Table 2 summarizes the basic characteristics of the identified risk prediction models.