GPAM

Generator function to create 2 layer GP using GPFlux given a dataset and its dimensions etc.

Parameters:
  • input_data (ndarray) –

    dataset to be used to train model - this is where we get parameters off of.

  • num_inducing (int, default: 50 ) –

    Number of inducing points to use. Defaults to 50.

  • return_layers (bool, default: False ) –

    Set to true if individual layers are to be returned alongside model. Defaults to False.

  • n_latent (int, default: 2 ) –

    Dimension of latent space. Defaults to 2.

Returns:
  • model( gpflux model ) –

    final model object.

Source code in GPyEDS/GPAM.py
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
def create_two_layer_GPAM_from_data(input_data, num_inducing = 50, return_layers = False, n_latent = 2):
    """Generator function to create 2 layer GP using GPFlux given a dataset and its dimensions etc.

    Args:
        input_data (ndarray): dataset to be used to train model - this is where we get parameters off of.
        num_inducing (int, optional): Number of inducing points to use. Defaults to 50.
        return_layers (bool, optional): Set to true if individual layers are to be returned alongside model. Defaults to False.
        n_latent (int, optional): Dimension of latent space. Defaults to 2.

    Returns:
        model (gpflux model): final model object.
    """

    num_data = input_data.shape[0]

    Z = input_data[np.random.choice(input_data.shape[0], size = num_inducing)]
    kernel1 = gpflow.kernels.SquaredExponential(lengthscales=[1]*input_data.shape[1])
    inducing_variable1 = gpflow.inducing_variables.InducingPoints(Z.copy())
    gp_layer1 = gpflux.layers.GPLayer(
        kernel1, inducing_variable1, num_data=num_data, num_latent_gps=n_latent, mean_function=gpflow.mean_functions.Zero()
    ) 

    kernel2 = gpflow.kernels.SquaredExponential(lengthscales=[1]*n_latent)
    inducing_variable2 = gpflow.inducing_variables.InducingPoints(np.random.rand(num_inducing,n_latent))
    gp_layer2 = gpflux.layers.GPLayer(
        kernel2,
        inducing_variable2,
        num_data=num_data, 
        num_latent_gps=input_data.shape[1],
        mean_function=gpflow.mean_functions.Zero(),
    )

    likelihood_layer = gpflux.layers.LikelihoodLayer(gpflow.likelihoods.Gaussian(0.1))
    two_layer_dgp = gpflux.models.DeepGP([gp_layer1, gp_layer2], likelihood_layer)
    model = two_layer_dgp.as_training_model()
    model.compile(tf.optimizers.Adam(0.01))

    if return_layers:
        return model, gp_layer1, gp_layer2
    else:
        return model

Generator function to create two layer GPAM model in GPFlux. Args: num_input (int): Number of input dimensions. num_data (int, optional): Number of data points used for training, important to calculate loss properly. Defaults to 1. Z (ndarray, optional): Array of inducing locations. Defaults to None - will be generated at random. num_inducing (int, optional): Number of inducing points. Defaults to 50. return_layers (bool, optional): Set to true if individual layers are to be returned alongside model. Defaults to False. n_latent (int, optional): Dimension of latent space. Defaults to 2.

Returns:
  • model( gpflux model ) –

    final model object.

Source code in GPyEDS/GPAM.py
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
def create_two_layer_GPAM_from_scratch(num_input, num_data = 1, Z = None, num_inducing = 50, return_layers = False, n_latent = 2):
    """Generator function to create two layer GPAM model in GPFlux. 
    Args:
        num_input (int): Number of input dimensions.
        num_data (int, optional): Number of data points used for training, important to calculate loss properly. Defaults to 1.
        Z (ndarray, optional): Array of inducing locations. Defaults to None - will be generated at random.
        num_inducing (int, optional): Number of inducing points. Defaults to 50.
        return_layers (bool, optional): Set to true if individual layers are to be returned alongside model. Defaults to False.
        n_latent (int, optional): Dimension of latent space. Defaults to 2.

    Returns:
        model (gpflux model): final model object.
    """

    if Z is not None:
        pass
    else:
        Z = np.random.rand(num_inducing, num_input)

    kernel1 = gpflow.kernels.SquaredExponential(lengthscales=[1]*num_input)
    inducing_variable1 = gpflow.inducing_variables.InducingPoints(Z.copy())
    gp_layer1 = gpflux.layers.GPLayer(
        kernel1, inducing_variable1, num_data=num_data, num_latent_gps=n_latent,mean_function=gpflow.mean_functions.Zero()
    )

    kernel2 = gpflow.kernels.SquaredExponential(lengthscales=[1]*n_latent)
    inducing_variable2 = gpflow.inducing_variables.InducingPoints(np.random.rand(num_inducing,n_latent))
    gp_layer2 = gpflux.layers.GPLayer(
        kernel2,
        inducing_variable2,
        num_data=num_data,
        num_latent_gps=num_input,
        mean_function=gpflow.mean_functions.Zero(),
    )

    likelihood_layer = gpflux.layers.LikelihoodLayer(gpflow.likelihoods.Gaussian(0.1))
    two_layer_dgp = gpflux.models.DeepGP([gp_layer1, gp_layer2], likelihood_layer)
    model = two_layer_dgp.as_training_model()
    model.compile(tf.optimizers.Adam(0.01))

    if return_layers:
        return model, gp_layer1, gp_layer2
    else:
        return model

Utility function for batched model inference to reduce memory usage.

Parameters:
  • data (ndarray) –

    Data to be used for inference.

  • encoder (model layer) –

    Encoding layer(s) from GPAM model to use for inference.

  • batch_size (int, default: 20000 ) –

    Size of batch to use - this depends on memory to be used. Defaults to 20000.

Returns:
  • latents( tuple of 2 ndarrays ) –

    Mean and variance of latent distributions for every data point.

Source code in GPyEDS/GPAM.py
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
def model_inference(data, encoder,batch_size=20000):
    """ Utility function for batched model inference to reduce memory usage. 

    Args:
        data (ndarray): Data to be used for inference.
        encoder (model layer): Encoding layer(s) from GPAM model to use for inference. 
        batch_size (int, optional): Size of batch to use - this depends on memory to be used. Defaults to 20000.

    Returns:
        latents (tuple of 2 ndarrays): Mean and variance of latent distributions for every data point.
    """
    import tqdm
    max_iter = len(data)/batch_size
    means = []
    vars = []
    for i in tqdm.tqdm(range(int(max_iter)+1)):
        if max_iter - i < 0:
            res = encoder(data[batch_size*i:])
        else:
            res = encoder(data[batch_size*i:batch_size*(i+1)])

        means.append(res.mean())
        vars.append(res.variance())

    return np.concatenate(means, axis = 0), np.concatenate(vars, axis = 0)