In KNN like algorithm we need to load model Data into cache for predicting the records.
Here is the example for KNN.
So if the model will be a large file say1 or 2 GB we will be able to load them into Distributed cache.
The one way is to split/partition the model Result into some files and perform the distance calculation for all records in that file and then find the min ditance and max occurance of classlabel and predict the outcome.
How can we parttion the file and perform the operation on these partition ?
ie 1 record parttition1,partition2,.... 2nd record parttition1,partition2,...
This is what came to my thought. Is there any further way. Any pointers would help me.