Abstract

A NEURAL NETWORK BASED CLUSTERING MODEL OF A COLOMBIAN COHORT OF RHEUMATOID ARTHRITIS PATIENTS

Full text
Background: Rheumatoid Arthritis (RA) is a chronic disease characterized by inflammation and joint pain. In daily clinical practice, it is usual to have multiple variables of different nature to define the current state of the disease, the patient’s risk profile, and the subsequent optimal treatment. Objectives: We aimed to identify the most influential variables from a suitable multivariable clustering and its labeling for an outpatient clinic-based cohort of Colombian RA patients. Methods: We execute a clustering model (Kohonen’s self-organizing map – SOM), applied to 23 variables (17 continuous and 6 discrete) obtained from 14,811 related follow-up visit records hosted on a previously preprocessed database of a cohort with data prospectively collected between 2013 and 2020. The included variables were the disease activity indexes (DAS28-ESR/CRP, CDAI, and SDAI; as outcome variables), serological status (autoantibodies positivity), and patients’ sociodemographic and clinical characteristics. Clustering method used for generating the groups was SOM with a size of 25 x 25 neurons and 10000 iterations. SOM allows us to generate the groups by the comparison of the Euclidean distance in the hyperspace generated by the dimensions composed by the variables. After clustering, a discrete label built upon the categorization of the disease activity allowed us to identify the behavior of the included variables regarding the aforementioned outcomes, without affecting the clustering process. We evaluated the corresponding weights and their influence on the proposed neural network. Results: Data from a total of 1,277 patients were included in the analysis. When both continuous and discrete variables were integrated, discrete data were transformed using the one-hot encoding method, creating new variables according to the corresponding number of categories. Dissimilarity between groups was very low when considering only the continuous variables, and it increases when adding all the other variables; likewise, regardless of the clinimetric index used for labeling, the clustering organization remains ( Figure 1a ). Figure 1. Clusters and heatmaps of variables’ weights In the construction of the groups, the influence of the RF and ACPA positivity was confirmed; furthermore, the antinuclear antibodies (ANAs) delivered a significant effect, especially those with negative ANAs or positive ANAs with a homogeneous pattern, on disease activity ( Figure 1b ). Conclusion: SOM, as well as other artificial neural networks (ANN) are important methods for clustering and 2D visualization, due to the multivariate nature of the clinical data and its difficult visualization in the generated n-dimensional hyperspace. The utilized labels confirm that the clustering is adequate when considering that there was an identical grouping behavior for those registers with similar characteristics and an equivalent disease activity score. The findings of this research provide insights into a potentially pivotal role of the influence of RF, ACPA, and ANAs and their interaction with the proposed outcome variables in the understanding and development of future classification or prediction models; based on artificial intelligence and big data methods rather than on classical epidemiological approaches. REFERENCES: [1]Aletaha D, Neogi T, Silman AJ, Funovits J, Felson DT, Bingham CO, et al. 2010 Rheumatoid Arthritis Classification Criteria. Arthritis & Rheumatism. 2010;62(9):2569–2581 Disclosure of Interests: None declared. Citation: , volume 81, supplement 1, year 2022, page 536Session: Rheumatoid arthritis - prognosis, predictors and outcome (POSTERS only)

9 organizations