bughelp wanted
Description
Here's a simple failing test script
import numpy as np
import tempfile
import unittest
# Generate dummy dataset
np.random.seed(123)
import logging
logging.basicConfig(level=logging.INFO)
import deepchem as dc
from rdkit import Chem
smiles = ["C"]
mols = [Chem.MolFromSmiles(smile) for smile in smiles]
feat = dc.feat.ConvMolFeaturizer()
X = feat.featurize(mols)
y = np.random.rand(1, 1)
train_dataset = dc.data.NumpyDataset(X, y, ids=smiles)
transformers = []
metric = dc.metrics.Metric(
dc.metrics.mean_squared_error, task_averager=np.mean)
model = dc.models.GraphConvModel(n_tasks=1, mode="regression", dropout=0.5, model_dir="./graphconv_model")
model.fit(train_dataset, nb_epoch=1)
orig_score = model.evaluate(train_dataset, [metric], transformers)
print("orig_score")
print(orig_score)
new_model = dc.models.GraphConvModel(n_tasks=1, mode="regression", dropout=0.5, model_dir="./graphconv_model")
new_model.restore()
new_score = new_model.evaluate(train_dataset, [metric], transformers)
print("new_score")
print(new_score)
assert orig_score == new_score
Here's the printouts that result from this script
/Users/bharath/opt/anaconda3/envs/deepchem/lib/python3.6/site-packages/tensorflow/python/framework/indexed_slices.py:434: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
INFO:deepchem.models.keras_model:Ending global_step 1: Average loss 0.00475866
INFO:deepchem.models.keras_model:TIMING: model fitting took 5.224 s
INFO:deepchem.metrics:computed_metrics: [0.5235924904790688]
orig_score
{'mean-mean_squared_error': 0.5235924904790688}
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. Either the Trackable object references in the Py
thon program have changed in an incompatible way, or the checkpoint was generated in an incompatible program.
Two checkpoint references resolved to different objects (<tf.Variable 'private__graph_conv_keras_model_1/graph_conv_2/kernel:0' shape=(75, 64)
dtype=float32, numpy=
array([[ 4.4995248e-03, 6.8631753e-02, -8.2775950e-05, ...,
9.0061918e-02, -5.3373262e-02, 1.2269877e-01],
[ 9.3627259e-02, 6.1479136e-02, 1.3002183e-01, ...,
-4.3840557e-03, -8.5053414e-02, 3.7676141e-02],
[-1.6051233e-01, 6.1249837e-02, 1.3138755e-01, ...,
1.6568883e-01, -1.2686929e-01, -1.7409901e-01],
...,
[ 1.3316955e-01, 9.1872528e-02, 7.7954307e-02, ...,
-1.8381983e-02, -1.9298962e-01, -8.3953649e-02],
[ 9.0767816e-02, 1.6689555e-01, 2.8418824e-02, ...,
-1.8065323e-01, 5.5107340e-02, -1.9644000e-01],
[-1.4626555e-01, -6.9499269e-02, 9.4810143e-02, ...,
1.6276342e-01, -3.9090957e-02, 2.0000762e-01]], dtype=float32)> and <tf.Variable 'private__graph_conv_keras_model_1/graph_conv_2/kern
el:0' shape=(75, 64) dtype=float32, numpy=
array([[ 0.01428294, 0.12348534, 0.16362987, ..., 0.14605765,
0.1838846 , 0.02513948],
[ 0.08537485, 0.0934632 , -0.02142404, ..., 0.13571231,
-0.11089277, -0.02331042],
[-0.03228797, 0.04885544, 0.04713757, ..., 0.13593303,
-0.01467833, 0.00903808],
...,
[ 0.18738167, -0.17059079, 0.0418079 , ..., -0.01394288,
0.13641016, -0.05478275],
[ 0.05549066, -0.18584336, 0.06955038, ..., -0.14905915,
-0.14295515, -0.08938611],
[-0.08604956, -0.08486389, 0.11554165, ..., 0.11536823,
0.17340572, 0.03584954]], dtype=float32)>).
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. Either the Trackable object references in the Python program have changed in an incompatible way, or the checkpoint was generated in an incompatible program.
Two checkpoint references resolved to different objects (<tf.Variable 'private__graph_conv_keras_model_1/graph_conv_2/bias:0' shape=(64,) dtype
=float32, numpy=
array([ 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0.00097008, 0. , -0.00099363,
-0.00097946, -0.00098862, 0.00097581, 0. , 0. ,
-0.0008924 , 0. , 0. , 0. , 0. ,
0. , 0.00082348, 0.00078612, 0. , 0. ,
0.00090047, 0.0008615 , 0. , 0. , 0. ,
0. , 0. , -0.00097074, 0. , -0.00073872,
-0.0008829 , -0.00096245, -0.00087156, -0.00089219, 0. ,
-0.00089659, -0.00081582, 0. , 0. , -0.00094864,
-0.00083577, 0. , 0.00091346, 0. , 0.00075098,
0.00093225, -0.00094252, -0.00094864, 0. , 0. ,
0. , 0.0009519 , 0. , 0. , -0.00093958,
0. , -0.00095233, 0.00015859, 0.000994 ], dtype=float32)> and <tf.Variable 'private__graph_conv_keras_model_1/graph_conv_2/bi
as:0' shape=(64,) dtype=float32, numpy=
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)>).
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. Either the Trackable object references in the Py
thon program have changed in an incompatible way, or the checkpoint was generated in an incompatible program.
Two checkpoint references resolved to different objects (<tf.Variable 'private__graph_conv_keras_model_1/graph_conv_3/kernel:0' shape=(64, 64)
dtype=float32, numpy=
array([[-0.08082333, -0.14572376, -0.19245544, ..., -0.20733352,
-0.0442199 , -0.11743415],
[ 0.07426409, 0.1026492 , 0.02717344, ..., 0.02840988,
-0.15217997, 0.02984542],
[-0.11081054, 0.06778781, -0.20188095, ..., -0.0791709 ,
-0.17596933, -0.20975627],
...,
[-0.14444749, -0.04584648, -0.03178209, ..., 0.10852 ,
-0.00810872, 0.05565516],
[ 0.08839096, 0.21500859, 0.06311794, ..., 0.1613014 ,
-0.2150805 , -0.15202223],
[ 0.14957619, 0.21731606, 0.21284355, ..., -0.08467104,
0.08962891, 0.1785949 ]], dtype=float32)> and <tf.Variable 'private__graph_conv_keras_model_1/graph_conv_3/kernel:0' shape=(64, 64) d
type=float32, numpy=
array([[-0.1336613 , -0.1270378 , -0.0345715 , ..., -0.06275596,
0.1497819 , 0.01383078],
[-0.18531801, 0.18253906, 0.02693482, ..., 0.05082844,
-0.17340271, 0.15489809],
[-0.0875622 , 0.20997234, -0.12924384, ..., 0.1812403 ,
0.19916232, 0.047437 ],
...,
[-0.01278952, -0.20331962, -0.05146629, ..., 0.00744121,
-0.02356011, -0.14106297],
[ 0.02996522, 0.07338057, -0.08159059, ..., -0.10143289,
0.0085362 , 0.11073922],
[-0.1267096 , 0.06842761, -0.14939544, ..., 0.08098547,
-0.14479446, -0.17727807]], dtype=float32)>).
WARNING:tensorflow:Inconsistent references when loading the checkpoint into this object graph. Either the Trackable object references in the Py
thon program have changed in an incompatible way, or the checkpoint was generated in an incompatible program.
Two checkpoint references resolved to different objects (<tf.Variable 'private__graph_conv_keras_model_1/graph_conv_3/bias:0' shape=(64,) dtype
=float32, numpy=
array([ 0.001 , 0.001 , -0.001 , 0.00099999, 0.001 ,
0.001 , -0.001 , -0.00099998, 0.001 , 0.00010065,
0.001 , -0.001 , -0.00099999, 0.001 , 0.001 ,
0.001 , 0.00099999, 0.001 , 0.001 , 0.001 ,
0.001 , -0.001 , -0.00099999, -0.001 , -0.001 ,
0.00099999, 0.001 , -0.001 , -0.001 , 0.001 ,
0.001 , 0.001 , 0.001 , 0.001 , 0.001 ,
0.00099999, 0.001 , -0.001 , 0.00099995, 0.001 ,
0.001 , -0.00099996, 0.001 , -0.00099995, -0.001 ,
0.001 , -0.001 , -0.001 , -0.001 , 0.001 ,
0.001 , 0.001 , -0.00099999, -0.001 , -0.001 ,
0.00099999, 0.001 , -0.00099998, 0.001 , 0.00099999,
0.001 , 0.001 , -0.001 , -0.001 ], dtype=float32)> and <tf.Variable 'private__graph_conv_keras_model_1/graph_conv_3/bi
as:0' shape=(64,) dtype=float32, numpy=
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)>).
INFO:deepchem.metrics:computed_metrics: [0.3948746252299107]
new_score
{'mean-mean_squared_error': 0.3948746252299107}
Traceback (most recent call last):
File "graphconv_save_and_restore.py", line 36, in <module>
assert orig_score == new_score
AssertionError
I came across this while trying to debug what was happening in https://github.com/deepchem/deepchem/pull/1878. I'm fairly sure that this is a bug in our save/restore code in general. It looks somehow like the restore isn't happening properly?
I'm a little stumped at this point. Do any of you folks have ideas on what could be behind this?
CC @peastman @vsomnath