RBM - how to get the affinity matrix from item_back_dict and user_back_dict
#868 opened on Jul 18, 2019
Repository metrics
- Stars
- (17,706 stars)
- PR merge metrics
- (Avg merge 6d 16h) (10 merged PRs in 30d)
Description
Description
I am trying to implement AzureML Hyperdrive based hyperparameter tuning of the RBM algorithm using example notebooks. I have a working RBM notebook with my dataset and I am using svd_training.py as an template for building my rbm_training,py file. As part of the RBM process an affinity matrix is created and the training and test set is built from the stratified sampler. I looked at the code and there is an optional parameter save_path that stores 4 numpy output files: item_back_dict.npy, item_dict.npy, user_back_dict.npy and user_dict.npy after invoking as follows
am1m = AffinityMatrix(DF = data, **header, save_path = DATA_DIR)
I am uploading the train and validate pkl data files to the default datastore from my local machine
During evaluation the following code requires the affinity matrix
top_k_df_1m = am1m.map_back_sparse(top_k_1m, kind = 'prediction') test_df_1m = am1m.map_back_sparse(Xtst_1m, kind = 'ratings')
How do I regenerate the affinity matrix object in the script that will be run remotely (rbm_training.py)? I was hoping to be able to use the four numpy files to enable map_back_sparse? I hope I don't have to upload the entire dataset and then regenerate an AffinityMatrix object remotely.
The AffinityMatrix code in sparse.py mentions that the numpy files can be use with a trained model but not sure how to load these 4 files to regenerate an AffinityMatrix object as the remote script executes.