Models included in the library§
Baselines Models§
Neural Collaborative Filtering§
- class src.models.collaborativeFiltering.CollaborativeFiltering(lr, n_factors, data_dir, metrics={})§
- Neural Collaborative Filtering Model. - Models user-news article interactions through a neural network. - Implementation based on the PyTorch implementation from: https://github.com/andreas-bauer/ml-movie-recommender - Parameters:
- lr (float) – learning rate 
- n_factors (int) – number of factors for embeddings 
- data_dir (str) – data directory 
 
 - configure_optimizers()§
- Choose what optimizers and learning-rate schedulers to use in your optimization. Normally you’d need one. But in the case of GANs or similar you might have multiple. - Returns:
- Any of these 6 options. - Single optimizer. 
- List or Tuple of optimizers. 
- Two lists - The first list has multiple optimizers, and the second has multiple LR schedulers (or multiple - lr_scheduler_config).
- Dictionary, with an - "optimizer"key, and (optionally) a- "lr_scheduler"key whose value is a single LR scheduler or- lr_scheduler_config.
- Tuple of dictionaries as described above, with an optional - "frequency"key.
- None - Fit will run without any optimizer. 
 
 - The - lr_scheduler_configis a dictionary which contains the scheduler and its associated configuration. The default configuration is shown below.- lr_scheduler_config = { # REQUIRED: The scheduler instance "scheduler": lr_scheduler, # The unit of the scheduler's step size, could also be 'step'. # 'epoch' updates the scheduler on epoch end whereas 'step' # updates it after a optimizer update. "interval": "epoch", # How many epochs/steps should pass between calls to # `scheduler.step()`. 1 corresponds to updating the learning # rate after every epoch/step. "frequency": 1, # Metric to to monitor for schedulers like `ReduceLROnPlateau` "monitor": "val_loss", # If set to `True`, will enforce that the value specified 'monitor' # is available when the scheduler is updated, thus stopping # training if not found. If set to `False`, it will only produce a warning "strict": True, # If using the `LearningRateMonitor` callback to monitor the # learning rate progress, this keyword can be used to specify # a custom logged name "name": None, } - When there are schedulers in which the - .step()method is conditioned on a value, such as the- torch.optim.lr_scheduler.ReduceLROnPlateauscheduler, Lightning requires that the- lr_scheduler_configcontains the keyword- "monitor"set to the metric name that the scheduler should be conditioned on.- Metrics can be made available to monitor by simply logging it using - self.log('metric_to_track', metric_val)in your- LightningModule.- Note - The - frequencyvalue specified in a dict along with the- optimizerkey is an int corresponding to the number of sequential batches optimized with the specific optimizer. It should be given to none or to all of the optimizers. There is a difference between passing multiple optimizers in a list, and passing multiple optimizers in dictionaries with a frequency of 1:- In the former case, all optimizers will operate on the given batch in each optimization step. 
- In the latter, only one optimizer will operate on the given batch at every step. 
 - This is different from the - frequencyvalue specified in the- lr_scheduler_configmentioned above.- def configure_optimizers(self): optimizer_one = torch.optim.SGD(self.model.parameters(), lr=0.01) optimizer_two = torch.optim.SGD(self.model.parameters(), lr=0.01) return [ {"optimizer": optimizer_one, "frequency": 5}, {"optimizer": optimizer_two, "frequency": 10}, ] - In this example, the first optimizer will be used for the first 5 steps, the second optimizer for the next 10 steps and that cycle will continue. If an LR scheduler is specified for an optimizer using the - lr_schedulerkey in the above dict, the scheduler will only be updated when its optimizer is being used.- Examples: - # most cases. no learning rate scheduler def configure_optimizers(self): return Adam(self.parameters(), lr=1e-3) # multiple optimizer case (e.g.: GAN) def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) return gen_opt, dis_opt # example with learning rate schedulers def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) dis_sch = CosineAnnealing(dis_opt, T_max=10) return [gen_opt, dis_opt], [dis_sch] # example with step-based learning rate schedulers # each optimizer has its own scheduler def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) gen_sch = { 'scheduler': ExponentialLR(gen_opt, 0.99), 'interval': 'step' # called after each training step } dis_sch = CosineAnnealing(dis_opt, T_max=10) # called every epoch return [gen_opt, dis_opt], [gen_sch, dis_sch] # example with optimizer frequencies # see training procedure in `Improved Training of Wasserstein GANs`, Algorithm 1 # https://arxiv.org/abs/1704.00028 def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) n_critic = 5 return ( {'optimizer': dis_opt, 'frequency': n_critic}, {'optimizer': gen_opt, 'frequency': 1} ) - Note - Some things to know: - Lightning calls - .backward()and- .step()on each optimizer as needed.
- If learning rate scheduler is specified in - configure_optimizers()with key- "interval"(default “epoch”) in the scheduler configuration, Lightning will call the scheduler’s- .step()method automatically in case of automatic optimization.
- If you use 16-bit precision ( - precision=16), Lightning will automatically handle the optimizers.
- If you use multiple optimizers, - training_step()will have an additional- optimizer_idxparameter.
- If you use - torch.optim.LBFGS, Lightning handles the closure function automatically for you.
- If you use multiple optimizers, gradients will be calculated only for the parameters of current optimizer at each training step. 
- If you need to control how often those optimizers step or override the default - .step()schedule, override the- optimizer_step()hook.
 
 - forward(u, n)§
- Same as - torch.nn.Module.forward().- Parameters:
- *args – Whatever you decide to pass into the forward method. 
- **kwargs – Keyword arguments are also possible. 
 
- Returns:
- Your model’s output 
 
 - get_n()§
- Get total number of users and news articles. - Returns:
- Number of users and number of news. 
 
 - test_epoch_end(outputs)§
- Called at the end of a test epoch with the output of all test steps. - # the pseudocode for these calls test_outs = [] for test_batch in test_data: out = test_step(test_batch) test_outs.append(out) test_epoch_end(test_outs) - Parameters:
- outputs – List of outputs you defined in - test_step_end(), or if there are multiple dataloaders, a list containing a list of outputs for each dataloader
- Returns:
- None 
 - Note - If you didn’t define a - test_step(), this won’t be called.- Examples - With a single dataloader: - def test_epoch_end(self, outputs): # do something with the outputs of all test batches all_test_preds = test_step_outputs.predictions some_result = calc_all_results(all_test_preds) self.log(some_result) - With multiple dataloaders, outputs will be a list of lists. The outer list contains one entry per dataloader, while the inner list contains the individual outputs of each test step for that dataloader. - def test_epoch_end(self, outputs): final_value = 0 for dataloader_outputs in outputs: for test_step_out in dataloader_outputs: # do something final_value += test_step_out self.log("final_metric", final_value) 
 - test_step(batch, batch_idx)§
- Operates on a single batch of data from the test set. In this step you’d normally generate examples or calculate anything of interest such as accuracy. - # the pseudocode for these calls test_outs = [] for test_batch in test_data: out = test_step(test_batch) test_outs.append(out) test_epoch_end(test_outs) - Parameters:
- batch – The output of your - DataLoader.
- batch_idx – The index of this batch. 
- dataloader_id – The index of the dataloader that produced this batch. (only if multiple test dataloaders used). 
 
- Returns:
- Any of. - Any object or value 
- None- Testing will skip to the next batch
 
 - # if you have one test dataloader: def test_step(self, batch, batch_idx): ... # if you have multiple test dataloaders: def test_step(self, batch, batch_idx, dataloader_idx=0): ... - Examples: - # CASE 1: A single test dataset def test_step(self, batch, batch_idx): x, y = batch # implement your own out = self(x) loss = self.loss(out, y) # log 6 example images # or generated text... or whatever sample_imgs = x[:6] grid = torchvision.utils.make_grid(sample_imgs) self.logger.experiment.add_image('example_images', grid, 0) # calculate acc labels_hat = torch.argmax(out, dim=1) test_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0) # log the outputs! self.log_dict({'test_loss': loss, 'test_acc': test_acc}) - If you pass in multiple test dataloaders, - test_step()will have an additional argument. We recommend setting the default value of 0 so that you can quickly switch between single and multiple dataloaders.- # CASE 2: multiple test dataloaders def test_step(self, batch, batch_idx, dataloader_idx=0): # dataloader_idx tells you which dataset this is. ... - Note - If you don’t need to test you don’t need to implement this method. - Note - When the - test_step()is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of the test epoch, the model goes back to training mode and gradients are enabled.
 - test_step_end(outputs)§
- Use this when testing with DP because - test_step()will operate on only part of the batch. However, this is still optional and only needed for things like softmax or NCE loss.- Note - If you later switch to ddp or some other mode, this will still be called so that you don’t have to change your code. - # pseudocode sub_batches = split_batches_for_dp(batch) step_output = [test_step(sub_batch) for sub_batch in sub_batches] test_step_end(step_output) - Parameters:
- step_output – What you return in - test_step()for each batch part.
- Returns:
- None or anything 
 - # WITHOUT test_step_end # if used in DP, this batch is 1/num_gpus large def test_step(self, batch, batch_idx): # batch is 1/num_gpus big x, y = batch out = self(x) loss = self.softmax(out) self.log("test_loss", loss) # -------------- # with test_step_end to do softmax over the full batch def test_step(self, batch, batch_idx): # batch is 1/num_gpus big x, y = batch out = self.encoder(x) return out def test_step_end(self, output_results): # this out is now the full size of the batch all_test_step_outs = output_results.out loss = nce_loss(all_test_step_outs) self.log("test_loss", loss) - See also - See the Multi GPU Training guide for more details. 
 - training_step(batch, batch_idx)§
- Here you compute and return the training loss and some additional metrics for e.g. the progress bar or logger. - Parameters:
- batch ( - Tensor| (- Tensor, …) | [- Tensor, …]) – The output of your- DataLoader. A tensor, tuple or list.
- batch_idx ( - int) – Integer displaying index of this batch
- optimizer_idx ( - int) – When using multiple optimizers, this argument will also be present.
- hiddens ( - Any) – Passed in if :paramref:`~pytorch_lightning.core.module.LightningModule.truncated_bptt_steps` > 0.
 
- Returns:
- Any of. - Tensor- The loss tensor
- dict- A dictionary. Can include any keys, but must include the key- 'loss'
- None- Training will skip to the next batch. This is only for automatic optimization.
- This is not supported for multi-GPU, TPU, IPU, or DeepSpeed. 
 
 
 - In this step you’d normally do the forward pass and calculate the loss for a batch. You can also do fancier things like multiple forward passes or something model specific. - Example: - def training_step(self, batch, batch_idx): x, y, z = batch out = self.encoder(x) loss = self.loss(out, x) return loss - If you define multiple optimizers, this step will be called with an additional - optimizer_idxparameter.- # Multiple optimizers (e.g.: GANs) def training_step(self, batch, batch_idx, optimizer_idx): if optimizer_idx == 0: # do training_step with encoder ... if optimizer_idx == 1: # do training_step with decoder ... - If you add truncated back propagation through time you will also get an additional argument with the hidden states of the previous step. - # Truncated back-propagation through time def training_step(self, batch, batch_idx, hiddens): # hiddens are the hidden states from the previous truncated backprop step out, hiddens = self.lstm(data, hiddens) loss = ... return {"loss": loss, "hiddens": hiddens} - Note - The loss value shown in the progress bar is smoothed (averaged) over the last values, so it differs from the actual loss returned in train/validation step. 
 - validation_epoch_end(outputs)§
- Called at the end of the validation epoch with the outputs of all validation steps. - # the pseudocode for these calls val_outs = [] for val_batch in val_data: out = validation_step(val_batch) val_outs.append(out) validation_epoch_end(val_outs) - Parameters:
- outputs – List of outputs you defined in - validation_step(), or if there are multiple dataloaders, a list containing a list of outputs for each dataloader.
- Returns:
- None 
 - Note - If you didn’t define a - validation_step(), this won’t be called.- Examples - With a single dataloader: - def validation_epoch_end(self, val_step_outputs): for out in val_step_outputs: ... - With multiple dataloaders, outputs will be a list of lists. The outer list contains one entry per dataloader, while the inner list contains the individual outputs of each validation step for that dataloader. - def validation_epoch_end(self, outputs): for dataloader_output_result in outputs: dataloader_outs = dataloader_output_result.dataloader_i_outputs self.log("final_metric", final_value) 
 - validation_step(batch, batch_idx)§
- Operates on a single batch of data from the validation set. In this step you’d might generate examples or calculate anything of interest like accuracy. - # the pseudocode for these calls val_outs = [] for val_batch in val_data: out = validation_step(val_batch) val_outs.append(out) validation_epoch_end(val_outs) - Parameters:
- batch – The output of your - DataLoader.
- batch_idx – The index of this batch. 
- dataloader_idx – The index of the dataloader that produced this batch. (only if multiple val dataloaders used) 
 
- Returns:
- Any object or value 
- None- Validation will skip to the next batch
 
 - # pseudocode of order val_outs = [] for val_batch in val_data: out = validation_step(val_batch) if defined("validation_step_end"): out = validation_step_end(out) val_outs.append(out) val_outs = validation_epoch_end(val_outs) - # if you have one val dataloader: def validation_step(self, batch, batch_idx): ... # if you have multiple val dataloaders: def validation_step(self, batch, batch_idx, dataloader_idx=0): ... - Examples: - # CASE 1: A single validation dataset def validation_step(self, batch, batch_idx): x, y = batch # implement your own out = self(x) loss = self.loss(out, y) # log 6 example images # or generated text... or whatever sample_imgs = x[:6] grid = torchvision.utils.make_grid(sample_imgs) self.logger.experiment.add_image('example_images', grid, 0) # calculate acc labels_hat = torch.argmax(out, dim=1) val_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0) # log the outputs! self.log_dict({'val_loss': loss, 'val_acc': val_acc}) - If you pass in multiple val dataloaders, - validation_step()will have an additional argument. We recommend setting the default value of 0 so that you can quickly switch between single and multiple dataloaders.- # CASE 2: multiple validation dataloaders def validation_step(self, batch, batch_idx, dataloader_idx=0): # dataloader_idx tells you which dataset this is. ... - Note - If you don’t need to validate you don’t need to implement this method. - Note - When the - validation_step()is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of validation, the model goes back to training mode and gradients are enabled.
 - validation_step_end(outputs)§
- Use this when validating with dp because - validation_step()will operate on only part of the batch. However, this is still optional and only needed for things like softmax or NCE loss.- Note - If you later switch to ddp or some other mode, this will still be called so that you don’t have to change your code. - # pseudocode sub_batches = split_batches_for_dp(batch) step_output = [validation_step(sub_batch) for sub_batch in sub_batches] validation_step_end(step_output) - Parameters:
- step_output – What you return in - validation_step()for each batch part.
- Returns:
- None or anything 
 - # WITHOUT validation_step_end # if used in DP, this batch is 1/num_gpus large def validation_step(self, batch, batch_idx): # batch is 1/num_gpus big x, y = batch out = self.encoder(x) loss = self.softmax(out) loss = nce_loss(loss) self.log("val_loss", loss) # -------------- # with validation_step_end to do softmax over the full batch def validation_step(self, batch, batch_idx): # batch is 1/num_gpus big x, y = batch out = self(x) return out def validation_step_end(self, val_step_outputs): for out in val_step_outputs: ... - See also - See the Multi GPU Training guide for more details. 
 
Content Based (BERT)§
Advanced Models§
MKR§
- class src.models.MKR.MKR(dim, l, h, l2_weight, lr_rs, lr_kge, kge_interval, use_inner_product, data_dir, metrics={})§
- MKR: Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation. - Original paper by: Wang, H., Zhang, F., Zhao, M., Li, W., Xie, X., & Guo, M. (2019, May). Multi-task feature learning for knowledge graph enhanced recommendation. In The world wide web conference (pp. 2000-2010) - Implementation based on the PyTorch implementation from: https://github.com/hsientzucheng/MKR.PyTorch - Parameters:
- dim (int) – Dimension for embeddings 
- l (int) – Number of low layers 
- h (int) – Number of high layers 
- l2_weight (float) – Weight of l2 regularization 
- lr_rs (float) – learning rate for ratings training 
- lr_kge (float) – Learning rate for knowledge graph training 
- kge_interval (int) – Knowledge graph embedding training interval 
- use_inner_product (bool) – Use inner product 
- data_dir (str) – Data directory 
 
 - configure_optimizers()§
- Choose what optimizers and learning-rate schedulers to use in your optimization. Normally you’d need one. But in the case of GANs or similar you might have multiple. - Returns:
- Any of these 6 options. - Single optimizer. 
- List or Tuple of optimizers. 
- Two lists - The first list has multiple optimizers, and the second has multiple LR schedulers (or multiple - lr_scheduler_config).
- Dictionary, with an - "optimizer"key, and (optionally) a- "lr_scheduler"key whose value is a single LR scheduler or- lr_scheduler_config.
- Tuple of dictionaries as described above, with an optional - "frequency"key.
- None - Fit will run without any optimizer. 
 
 - The - lr_scheduler_configis a dictionary which contains the scheduler and its associated configuration. The default configuration is shown below.- lr_scheduler_config = { # REQUIRED: The scheduler instance "scheduler": lr_scheduler, # The unit of the scheduler's step size, could also be 'step'. # 'epoch' updates the scheduler on epoch end whereas 'step' # updates it after a optimizer update. "interval": "epoch", # How many epochs/steps should pass between calls to # `scheduler.step()`. 1 corresponds to updating the learning # rate after every epoch/step. "frequency": 1, # Metric to to monitor for schedulers like `ReduceLROnPlateau` "monitor": "val_loss", # If set to `True`, will enforce that the value specified 'monitor' # is available when the scheduler is updated, thus stopping # training if not found. If set to `False`, it will only produce a warning "strict": True, # If using the `LearningRateMonitor` callback to monitor the # learning rate progress, this keyword can be used to specify # a custom logged name "name": None, } - When there are schedulers in which the - .step()method is conditioned on a value, such as the- torch.optim.lr_scheduler.ReduceLROnPlateauscheduler, Lightning requires that the- lr_scheduler_configcontains the keyword- "monitor"set to the metric name that the scheduler should be conditioned on.- Metrics can be made available to monitor by simply logging it using - self.log('metric_to_track', metric_val)in your- LightningModule.- Note - The - frequencyvalue specified in a dict along with the- optimizerkey is an int corresponding to the number of sequential batches optimized with the specific optimizer. It should be given to none or to all of the optimizers. There is a difference between passing multiple optimizers in a list, and passing multiple optimizers in dictionaries with a frequency of 1:- In the former case, all optimizers will operate on the given batch in each optimization step. 
- In the latter, only one optimizer will operate on the given batch at every step. 
 - This is different from the - frequencyvalue specified in the- lr_scheduler_configmentioned above.- def configure_optimizers(self): optimizer_one = torch.optim.SGD(self.model.parameters(), lr=0.01) optimizer_two = torch.optim.SGD(self.model.parameters(), lr=0.01) return [ {"optimizer": optimizer_one, "frequency": 5}, {"optimizer": optimizer_two, "frequency": 10}, ] - In this example, the first optimizer will be used for the first 5 steps, the second optimizer for the next 10 steps and that cycle will continue. If an LR scheduler is specified for an optimizer using the - lr_schedulerkey in the above dict, the scheduler will only be updated when its optimizer is being used.- Examples: - # most cases. no learning rate scheduler def configure_optimizers(self): return Adam(self.parameters(), lr=1e-3) # multiple optimizer case (e.g.: GAN) def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) return gen_opt, dis_opt # example with learning rate schedulers def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) dis_sch = CosineAnnealing(dis_opt, T_max=10) return [gen_opt, dis_opt], [dis_sch] # example with step-based learning rate schedulers # each optimizer has its own scheduler def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) gen_sch = { 'scheduler': ExponentialLR(gen_opt, 0.99), 'interval': 'step' # called after each training step } dis_sch = CosineAnnealing(dis_opt, T_max=10) # called every epoch return [gen_opt, dis_opt], [gen_sch, dis_sch] # example with optimizer frequencies # see training procedure in `Improved Training of Wasserstein GANs`, Algorithm 1 # https://arxiv.org/abs/1704.00028 def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) n_critic = 5 return ( {'optimizer': dis_opt, 'frequency': n_critic}, {'optimizer': gen_opt, 'frequency': 1} ) - Note - Some things to know: - Lightning calls - .backward()and- .step()on each optimizer as needed.
- If learning rate scheduler is specified in - configure_optimizers()with key- "interval"(default “epoch”) in the scheduler configuration, Lightning will call the scheduler’s- .step()method automatically in case of automatic optimization.
- If you use 16-bit precision ( - precision=16), Lightning will automatically handle the optimizers.
- If you use multiple optimizers, - training_step()will have an additional- optimizer_idxparameter.
- If you use - torch.optim.LBFGS, Lightning handles the closure function automatically for you.
- If you use multiple optimizers, gradients will be calculated only for the parameters of current optimizer at each training step. 
- If you need to control how often those optimizers step or override the default - .step()schedule, override the- optimizer_step()hook.
 
 - forward(user_indices=None, item_indices=None, head_indices=None, relation_indices=None, tail_indices=None)§
- Same as - torch.nn.Module.forward().- Parameters:
- *args – Whatever you decide to pass into the forward method. 
- **kwargs – Keyword arguments are also possible. 
 
- Returns:
- Your model’s output 
 
 - optimizer_step(epoch, batch_idx, optimizer, optimizer_idx, optimizer_closure, on_tpu=None, using_native_amp=None, using_lbfgs=None)§
- Override this method to adjust the default way the - Trainercalls each optimizer.- By default, Lightning calls - step()and- zero_grad()as shown in the example once per optimizer. This method (and- zero_grad()) won’t be called during the accumulation phase when- Trainer(accumulate_grad_batches != 1). Overriding this hook has no benefit with manual optimization.- Parameters:
- epoch – Current epoch 
- batch_idx – Index of current batch 
- optimizer – A PyTorch optimizer 
- optimizer_idx – If you used multiple optimizers, this indexes into that list. 
- optimizer_closure – The optimizer closure. This closure must be executed as it includes the calls to - training_step(),- optimizer.zero_grad(), and- backward().
- on_tpu – - Trueif TPU backward is required
- using_native_amp – - Trueif using native amp
- using_lbfgs – True if the matching optimizer is - torch.optim.LBFGS
 
 - Examples: - # DEFAULT def optimizer_step(self, epoch, batch_idx, optimizer, optimizer_idx, optimizer_closure, on_tpu, using_native_amp, using_lbfgs): optimizer.step(closure=optimizer_closure) # Alternating schedule for optimizer steps (i.e.: GANs) def optimizer_step(self, epoch, batch_idx, optimizer, optimizer_idx, optimizer_closure, on_tpu, using_native_amp, using_lbfgs): # update generator opt every step if optimizer_idx == 0: optimizer.step(closure=optimizer_closure) # update discriminator opt every 2 steps if optimizer_idx == 1: if (batch_idx + 1) % 2 == 0 : optimizer.step(closure=optimizer_closure) else: # call the closure by itself to run `training_step` + `backward` without an optimizer step optimizer_closure() # ... # add as many optimizers as you want - Here’s another example showing how to use this for more advanced things such as learning rate warm-up: - # learning rate warm-up def optimizer_step( self, epoch, batch_idx, optimizer, optimizer_idx, optimizer_closure, on_tpu, using_native_amp, using_lbfgs, ): # update params optimizer.step(closure=optimizer_closure) # manually warm up lr without a scheduler if self.trainer.global_step < 500: lr_scale = min(1.0, float(self.trainer.global_step + 1) / 500.0) for pg in optimizer.param_groups: pg["lr"] = lr_scale * self.learning_rate 
 - test_epoch_end(outputs)§
- Called at the end of a test epoch with the output of all test steps. - # the pseudocode for these calls test_outs = [] for test_batch in test_data: out = test_step(test_batch) test_outs.append(out) test_epoch_end(test_outs) - Parameters:
- outputs – List of outputs you defined in - test_step_end(), or if there are multiple dataloaders, a list containing a list of outputs for each dataloader
- Returns:
- None 
 - Note - If you didn’t define a - test_step(), this won’t be called.- Examples - With a single dataloader: - def test_epoch_end(self, outputs): # do something with the outputs of all test batches all_test_preds = test_step_outputs.predictions some_result = calc_all_results(all_test_preds) self.log(some_result) - With multiple dataloaders, outputs will be a list of lists. The outer list contains one entry per dataloader, while the inner list contains the individual outputs of each test step for that dataloader. - def test_epoch_end(self, outputs): final_value = 0 for dataloader_outputs in outputs: for test_step_out in dataloader_outputs: # do something final_value += test_step_out self.log("final_metric", final_value) 
 - test_step(batch, batch_idx)§
- Operates on a single batch of data from the test set. In this step you’d normally generate examples or calculate anything of interest such as accuracy. - # the pseudocode for these calls test_outs = [] for test_batch in test_data: out = test_step(test_batch) test_outs.append(out) test_epoch_end(test_outs) - Parameters:
- batch – The output of your - DataLoader.
- batch_idx – The index of this batch. 
- dataloader_id – The index of the dataloader that produced this batch. (only if multiple test dataloaders used). 
 
- Returns:
- Any of. - Any object or value 
- None- Testing will skip to the next batch
 
 - # if you have one test dataloader: def test_step(self, batch, batch_idx): ... # if you have multiple test dataloaders: def test_step(self, batch, batch_idx, dataloader_idx=0): ... - Examples: - # CASE 1: A single test dataset def test_step(self, batch, batch_idx): x, y = batch # implement your own out = self(x) loss = self.loss(out, y) # log 6 example images # or generated text... or whatever sample_imgs = x[:6] grid = torchvision.utils.make_grid(sample_imgs) self.logger.experiment.add_image('example_images', grid, 0) # calculate acc labels_hat = torch.argmax(out, dim=1) test_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0) # log the outputs! self.log_dict({'test_loss': loss, 'test_acc': test_acc}) - If you pass in multiple test dataloaders, - test_step()will have an additional argument. We recommend setting the default value of 0 so that you can quickly switch between single and multiple dataloaders.- # CASE 2: multiple test dataloaders def test_step(self, batch, batch_idx, dataloader_idx=0): # dataloader_idx tells you which dataset this is. ... - Note - If you don’t need to test you don’t need to implement this method. - Note - When the - test_step()is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of the test epoch, the model goes back to training mode and gradients are enabled.
 - test_step_end(outputs)§
- Use this when testing with DP because - test_step()will operate on only part of the batch. However, this is still optional and only needed for things like softmax or NCE loss.- Note - If you later switch to ddp or some other mode, this will still be called so that you don’t have to change your code. - # pseudocode sub_batches = split_batches_for_dp(batch) step_output = [test_step(sub_batch) for sub_batch in sub_batches] test_step_end(step_output) - Parameters:
- step_output – What you return in - test_step()for each batch part.
- Returns:
- None or anything 
 - # WITHOUT test_step_end # if used in DP, this batch is 1/num_gpus large def test_step(self, batch, batch_idx): # batch is 1/num_gpus big x, y = batch out = self(x) loss = self.softmax(out) self.log("test_loss", loss) # -------------- # with test_step_end to do softmax over the full batch def test_step(self, batch, batch_idx): # batch is 1/num_gpus big x, y = batch out = self.encoder(x) return out def test_step_end(self, output_results): # this out is now the full size of the batch all_test_step_outs = output_results.out loss = nce_loss(all_test_step_outs) self.log("test_loss", loss) - See also - See the Multi GPU Training guide for more details. 
 - training_step(batch, batch_idx, optimizer_idx)§
- Here you compute and return the training loss and some additional metrics for e.g. the progress bar or logger. - Parameters:
- batch ( - Tensor| (- Tensor, …) | [- Tensor, …]) – The output of your- DataLoader. A tensor, tuple or list.
- batch_idx ( - int) – Integer displaying index of this batch
- optimizer_idx ( - int) – When using multiple optimizers, this argument will also be present.
- hiddens ( - Any) – Passed in if :paramref:`~pytorch_lightning.core.module.LightningModule.truncated_bptt_steps` > 0.
 
- Returns:
- Any of. - Tensor- The loss tensor
- dict- A dictionary. Can include any keys, but must include the key- 'loss'
- None- Training will skip to the next batch. This is only for automatic optimization.
- This is not supported for multi-GPU, TPU, IPU, or DeepSpeed. 
 
 
 - In this step you’d normally do the forward pass and calculate the loss for a batch. You can also do fancier things like multiple forward passes or something model specific. - Example: - def training_step(self, batch, batch_idx): x, y, z = batch out = self.encoder(x) loss = self.loss(out, x) return loss - If you define multiple optimizers, this step will be called with an additional - optimizer_idxparameter.- # Multiple optimizers (e.g.: GANs) def training_step(self, batch, batch_idx, optimizer_idx): if optimizer_idx == 0: # do training_step with encoder ... if optimizer_idx == 1: # do training_step with decoder ... - If you add truncated back propagation through time you will also get an additional argument with the hidden states of the previous step. - # Truncated back-propagation through time def training_step(self, batch, batch_idx, hiddens): # hiddens are the hidden states from the previous truncated backprop step out, hiddens = self.lstm(data, hiddens) loss = ... return {"loss": loss, "hiddens": hiddens} - Note - The loss value shown in the progress bar is smoothed (averaged) over the last values, so it differs from the actual loss returned in train/validation step. 
 - validation_epoch_end(outputs)§
- Called at the end of the validation epoch with the outputs of all validation steps. - # the pseudocode for these calls val_outs = [] for val_batch in val_data: out = validation_step(val_batch) val_outs.append(out) validation_epoch_end(val_outs) - Parameters:
- outputs – List of outputs you defined in - validation_step(), or if there are multiple dataloaders, a list containing a list of outputs for each dataloader.
- Returns:
- None 
 - Note - If you didn’t define a - validation_step(), this won’t be called.- Examples - With a single dataloader: - def validation_epoch_end(self, val_step_outputs): for out in val_step_outputs: ... - With multiple dataloaders, outputs will be a list of lists. The outer list contains one entry per dataloader, while the inner list contains the individual outputs of each validation step for that dataloader. - def validation_epoch_end(self, outputs): for dataloader_output_result in outputs: dataloader_outs = dataloader_output_result.dataloader_i_outputs self.log("final_metric", final_value) 
 - validation_step(batch, batch_idx)§
- Operates on a single batch of data from the validation set. In this step you’d might generate examples or calculate anything of interest like accuracy. - # the pseudocode for these calls val_outs = [] for val_batch in val_data: out = validation_step(val_batch) val_outs.append(out) validation_epoch_end(val_outs) - Parameters:
- batch – The output of your - DataLoader.
- batch_idx – The index of this batch. 
- dataloader_idx – The index of the dataloader that produced this batch. (only if multiple val dataloaders used) 
 
- Returns:
- Any object or value 
- None- Validation will skip to the next batch
 
 - # pseudocode of order val_outs = [] for val_batch in val_data: out = validation_step(val_batch) if defined("validation_step_end"): out = validation_step_end(out) val_outs.append(out) val_outs = validation_epoch_end(val_outs) - # if you have one val dataloader: def validation_step(self, batch, batch_idx): ... # if you have multiple val dataloaders: def validation_step(self, batch, batch_idx, dataloader_idx=0): ... - Examples: - # CASE 1: A single validation dataset def validation_step(self, batch, batch_idx): x, y = batch # implement your own out = self(x) loss = self.loss(out, y) # log 6 example images # or generated text... or whatever sample_imgs = x[:6] grid = torchvision.utils.make_grid(sample_imgs) self.logger.experiment.add_image('example_images', grid, 0) # calculate acc labels_hat = torch.argmax(out, dim=1) val_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0) # log the outputs! self.log_dict({'val_loss': loss, 'val_acc': val_acc}) - If you pass in multiple val dataloaders, - validation_step()will have an additional argument. We recommend setting the default value of 0 so that you can quickly switch between single and multiple dataloaders.- # CASE 2: multiple validation dataloaders def validation_step(self, batch, batch_idx, dataloader_idx=0): # dataloader_idx tells you which dataset this is. ... - Note - If you don’t need to validate you don’t need to implement this method. - Note - When the - validation_step()is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of validation, the model goes back to training mode and gradients are enabled.
 - validation_step_end(outputs)§
- Use this when validating with dp because - validation_step()will operate on only part of the batch. However, this is still optional and only needed for things like softmax or NCE loss.- Note - If you later switch to ddp or some other mode, this will still be called so that you don’t have to change your code. - # pseudocode sub_batches = split_batches_for_dp(batch) step_output = [validation_step(sub_batch) for sub_batch in sub_batches] validation_step_end(step_output) - Parameters:
- step_output – What you return in - validation_step()for each batch part.
- Returns:
- None or anything 
 - # WITHOUT validation_step_end # if used in DP, this batch is 1/num_gpus large def validation_step(self, batch, batch_idx): # batch is 1/num_gpus big x, y = batch out = self.encoder(x) loss = self.softmax(out) loss = nce_loss(loss) self.log("val_loss", loss) # -------------- # with validation_step_end to do softmax over the full batch def validation_step(self, batch, batch_idx): # batch is 1/num_gpus big x, y = batch out = self(x) return out def validation_step_end(self, val_step_outputs): for out in val_step_outputs: ... - See also - See the Multi GPU Training guide for more details. 
 
RippleNet§
- class src.models.rippleNet.RippleNet(dim, n_hop, kge_weight, l2_weight, lr, n_memory, item_update_mode, using_all_hops, data_dir, metrics={})§
- RippleNet: Deep end-to-end model using knowledge graphs for user preference propagation. - Original paper by: Wang, H., Zhang, F., Wang, J., Zhao, M., Li, W., Xie, X., & Guo, M. (2018, October). Ripplenet: Propagating user preferences on the knowledge graph for recommender systems. In Proceedings of the 27th ACM international conference on information and knowledge management (pp. 417-426). - Implementation based on the PyTorch RippleNet implementation from: https://github.com/qibinc/RippleNet-PyTorch - Parameters:
- dim (int) – Dimension for embeddings 
- n_hop (int) – Maximum number of hops 
- kge_weight (float) – Knowledge graph weight 
- l2_weight (float) – Weight of l2 regularization 
- lr (float) – Learning rate 
- n_memory (int) – Size of ripple set for each hop 
- item_update_mode (str) – How to update item at the end of each hop 
- using_all_hops (bool) – Whether to use outputs of all hops or just the last hop when making predictions 
- data_dir (str) – Data directory 
 
 - configure_optimizers()§
- Choose what optimizers and learning-rate schedulers to use in your optimization. Normally you’d need one. But in the case of GANs or similar you might have multiple. - Returns:
- Any of these 6 options. - Single optimizer. 
- List or Tuple of optimizers. 
- Two lists - The first list has multiple optimizers, and the second has multiple LR schedulers (or multiple - lr_scheduler_config).
- Dictionary, with an - "optimizer"key, and (optionally) a- "lr_scheduler"key whose value is a single LR scheduler or- lr_scheduler_config.
- Tuple of dictionaries as described above, with an optional - "frequency"key.
- None - Fit will run without any optimizer. 
 
 - The - lr_scheduler_configis a dictionary which contains the scheduler and its associated configuration. The default configuration is shown below.- lr_scheduler_config = { # REQUIRED: The scheduler instance "scheduler": lr_scheduler, # The unit of the scheduler's step size, could also be 'step'. # 'epoch' updates the scheduler on epoch end whereas 'step' # updates it after a optimizer update. "interval": "epoch", # How many epochs/steps should pass between calls to # `scheduler.step()`. 1 corresponds to updating the learning # rate after every epoch/step. "frequency": 1, # Metric to to monitor for schedulers like `ReduceLROnPlateau` "monitor": "val_loss", # If set to `True`, will enforce that the value specified 'monitor' # is available when the scheduler is updated, thus stopping # training if not found. If set to `False`, it will only produce a warning "strict": True, # If using the `LearningRateMonitor` callback to monitor the # learning rate progress, this keyword can be used to specify # a custom logged name "name": None, } - When there are schedulers in which the - .step()method is conditioned on a value, such as the- torch.optim.lr_scheduler.ReduceLROnPlateauscheduler, Lightning requires that the- lr_scheduler_configcontains the keyword- "monitor"set to the metric name that the scheduler should be conditioned on.- Metrics can be made available to monitor by simply logging it using - self.log('metric_to_track', metric_val)in your- LightningModule.- Note - The - frequencyvalue specified in a dict along with the- optimizerkey is an int corresponding to the number of sequential batches optimized with the specific optimizer. It should be given to none or to all of the optimizers. There is a difference between passing multiple optimizers in a list, and passing multiple optimizers in dictionaries with a frequency of 1:- In the former case, all optimizers will operate on the given batch in each optimization step. 
- In the latter, only one optimizer will operate on the given batch at every step. 
 - This is different from the - frequencyvalue specified in the- lr_scheduler_configmentioned above.- def configure_optimizers(self): optimizer_one = torch.optim.SGD(self.model.parameters(), lr=0.01) optimizer_two = torch.optim.SGD(self.model.parameters(), lr=0.01) return [ {"optimizer": optimizer_one, "frequency": 5}, {"optimizer": optimizer_two, "frequency": 10}, ] - In this example, the first optimizer will be used for the first 5 steps, the second optimizer for the next 10 steps and that cycle will continue. If an LR scheduler is specified for an optimizer using the - lr_schedulerkey in the above dict, the scheduler will only be updated when its optimizer is being used.- Examples: - # most cases. no learning rate scheduler def configure_optimizers(self): return Adam(self.parameters(), lr=1e-3) # multiple optimizer case (e.g.: GAN) def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) return gen_opt, dis_opt # example with learning rate schedulers def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) dis_sch = CosineAnnealing(dis_opt, T_max=10) return [gen_opt, dis_opt], [dis_sch] # example with step-based learning rate schedulers # each optimizer has its own scheduler def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) gen_sch = { 'scheduler': ExponentialLR(gen_opt, 0.99), 'interval': 'step' # called after each training step } dis_sch = CosineAnnealing(dis_opt, T_max=10) # called every epoch return [gen_opt, dis_opt], [gen_sch, dis_sch] # example with optimizer frequencies # see training procedure in `Improved Training of Wasserstein GANs`, Algorithm 1 # https://arxiv.org/abs/1704.00028 def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) n_critic = 5 return ( {'optimizer': dis_opt, 'frequency': n_critic}, {'optimizer': gen_opt, 'frequency': 1} ) - Note - Some things to know: - Lightning calls - .backward()and- .step()on each optimizer as needed.
- If learning rate scheduler is specified in - configure_optimizers()with key- "interval"(default “epoch”) in the scheduler configuration, Lightning will call the scheduler’s- .step()method automatically in case of automatic optimization.
- If you use 16-bit precision ( - precision=16), Lightning will automatically handle the optimizers.
- If you use multiple optimizers, - training_step()will have an additional- optimizer_idxparameter.
- If you use - torch.optim.LBFGS, Lightning handles the closure function automatically for you.
- If you use multiple optimizers, gradients will be calculated only for the parameters of current optimizer at each training step. 
- If you need to control how often those optimizers step or override the default - .step()schedule, override the- optimizer_step()hook.
 
 - create_knowledge_graph()§
- create knowledge graph for use in model training :returns: count entities - n_relation (int): count relations kg (ndarray): knowledge graph - Return type:
- n_entity (int) 
 
 - create_ripple_set()§
- create a ripple set for preference propagation - Returns:
- contains memories_h, memories_r, memories_t for each user memories_h (list): entities that mark the head of knowledge graph relations memories_r (list): entities that mark the relation of knowledge graph relations memories_t (list): entities that mark the tail of knowledge graph relations 
- Return type:
- ripple_set (defaultdict) 
 
 - forward(items: LongTensor, labels: LongTensor, memories_h: list, memories_r: list, memories_t: list)§
- Same as - torch.nn.Module.forward().- Parameters:
- *args – Whatever you decide to pass into the forward method. 
- **kwargs – Keyword arguments are also possible. 
 
- Returns:
- Your model’s output 
 
 - get_feed_dict(batch)§
- Get Feed dict :param batch (): - Returns:
- items (), labels (), memories_h (list), memories_r (list), memories_t (list) 
 
 - prepare_rating_data_train()§
- Create a dict containing user ratings :returns: user ratings history :rtype: user_history_dict (dict) 
 - test_epoch_end(outputs)§
- Called at the end of a test epoch with the output of all test steps. - # the pseudocode for these calls test_outs = [] for test_batch in test_data: out = test_step(test_batch) test_outs.append(out) test_epoch_end(test_outs) - Parameters:
- outputs – List of outputs you defined in - test_step_end(), or if there are multiple dataloaders, a list containing a list of outputs for each dataloader
- Returns:
- None 
 - Note - If you didn’t define a - test_step(), this won’t be called.- Examples - With a single dataloader: - def test_epoch_end(self, outputs): # do something with the outputs of all test batches all_test_preds = test_step_outputs.predictions some_result = calc_all_results(all_test_preds) self.log(some_result) - With multiple dataloaders, outputs will be a list of lists. The outer list contains one entry per dataloader, while the inner list contains the individual outputs of each test step for that dataloader. - def test_epoch_end(self, outputs): final_value = 0 for dataloader_outputs in outputs: for test_step_out in dataloader_outputs: # do something final_value += test_step_out self.log("final_metric", final_value) 
 - test_step(batch, batch_idx)§
- Operates on a single batch of data from the test set. In this step you’d normally generate examples or calculate anything of interest such as accuracy. - # the pseudocode for these calls test_outs = [] for test_batch in test_data: out = test_step(test_batch) test_outs.append(out) test_epoch_end(test_outs) - Parameters:
- batch – The output of your - DataLoader.
- batch_idx – The index of this batch. 
- dataloader_id – The index of the dataloader that produced this batch. (only if multiple test dataloaders used). 
 
- Returns:
- Any of. - Any object or value 
- None- Testing will skip to the next batch
 
 - # if you have one test dataloader: def test_step(self, batch, batch_idx): ... # if you have multiple test dataloaders: def test_step(self, batch, batch_idx, dataloader_idx=0): ... - Examples: - # CASE 1: A single test dataset def test_step(self, batch, batch_idx): x, y = batch # implement your own out = self(x) loss = self.loss(out, y) # log 6 example images # or generated text... or whatever sample_imgs = x[:6] grid = torchvision.utils.make_grid(sample_imgs) self.logger.experiment.add_image('example_images', grid, 0) # calculate acc labels_hat = torch.argmax(out, dim=1) test_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0) # log the outputs! self.log_dict({'test_loss': loss, 'test_acc': test_acc}) - If you pass in multiple test dataloaders, - test_step()will have an additional argument. We recommend setting the default value of 0 so that you can quickly switch between single and multiple dataloaders.- # CASE 2: multiple test dataloaders def test_step(self, batch, batch_idx, dataloader_idx=0): # dataloader_idx tells you which dataset this is. ... - Note - If you don’t need to test you don’t need to implement this method. - Note - When the - test_step()is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of the test epoch, the model goes back to training mode and gradients are enabled.
 - test_step_end(outputs)§
- Use this when testing with DP because - test_step()will operate on only part of the batch. However, this is still optional and only needed for things like softmax or NCE loss.- Note - If you later switch to ddp or some other mode, this will still be called so that you don’t have to change your code. - # pseudocode sub_batches = split_batches_for_dp(batch) step_output = [test_step(sub_batch) for sub_batch in sub_batches] test_step_end(step_output) - Parameters:
- step_output – What you return in - test_step()for each batch part.
- Returns:
- None or anything 
 - # WITHOUT test_step_end # if used in DP, this batch is 1/num_gpus large def test_step(self, batch, batch_idx): # batch is 1/num_gpus big x, y = batch out = self(x) loss = self.softmax(out) self.log("test_loss", loss) # -------------- # with test_step_end to do softmax over the full batch def test_step(self, batch, batch_idx): # batch is 1/num_gpus big x, y = batch out = self.encoder(x) return out def test_step_end(self, output_results): # this out is now the full size of the batch all_test_step_outs = output_results.out loss = nce_loss(all_test_step_outs) self.log("test_loss", loss) - See also - See the Multi GPU Training guide for more details. 
 - training_step(batch, batch_idx)§
- Here you compute and return the training loss and some additional metrics for e.g. the progress bar or logger. - Parameters:
- batch ( - Tensor| (- Tensor, …) | [- Tensor, …]) – The output of your- DataLoader. A tensor, tuple or list.
- batch_idx ( - int) – Integer displaying index of this batch
- optimizer_idx ( - int) – When using multiple optimizers, this argument will also be present.
- hiddens ( - Any) – Passed in if :paramref:`~pytorch_lightning.core.module.LightningModule.truncated_bptt_steps` > 0.
 
- Returns:
- Any of. - Tensor- The loss tensor
- dict- A dictionary. Can include any keys, but must include the key- 'loss'
- None- Training will skip to the next batch. This is only for automatic optimization.
- This is not supported for multi-GPU, TPU, IPU, or DeepSpeed. 
 
 
 - In this step you’d normally do the forward pass and calculate the loss for a batch. You can also do fancier things like multiple forward passes or something model specific. - Example: - def training_step(self, batch, batch_idx): x, y, z = batch out = self.encoder(x) loss = self.loss(out, x) return loss - If you define multiple optimizers, this step will be called with an additional - optimizer_idxparameter.- # Multiple optimizers (e.g.: GANs) def training_step(self, batch, batch_idx, optimizer_idx): if optimizer_idx == 0: # do training_step with encoder ... if optimizer_idx == 1: # do training_step with decoder ... - If you add truncated back propagation through time you will also get an additional argument with the hidden states of the previous step. - # Truncated back-propagation through time def training_step(self, batch, batch_idx, hiddens): # hiddens are the hidden states from the previous truncated backprop step out, hiddens = self.lstm(data, hiddens) loss = ... return {"loss": loss, "hiddens": hiddens} - Note - The loss value shown in the progress bar is smoothed (averaged) over the last values, so it differs from the actual loss returned in train/validation step. 
 - validation_epoch_end(outputs)§
- Called at the end of the validation epoch with the outputs of all validation steps. - # the pseudocode for these calls val_outs = [] for val_batch in val_data: out = validation_step(val_batch) val_outs.append(out) validation_epoch_end(val_outs) - Parameters:
- outputs – List of outputs you defined in - validation_step(), or if there are multiple dataloaders, a list containing a list of outputs for each dataloader.
- Returns:
- None 
 - Note - If you didn’t define a - validation_step(), this won’t be called.- Examples - With a single dataloader: - def validation_epoch_end(self, val_step_outputs): for out in val_step_outputs: ... - With multiple dataloaders, outputs will be a list of lists. The outer list contains one entry per dataloader, while the inner list contains the individual outputs of each validation step for that dataloader. - def validation_epoch_end(self, outputs): for dataloader_output_result in outputs: dataloader_outs = dataloader_output_result.dataloader_i_outputs self.log("final_metric", final_value) 
 - validation_step(batch, batch_idx)§
- Operates on a single batch of data from the validation set. In this step you’d might generate examples or calculate anything of interest like accuracy. - # the pseudocode for these calls val_outs = [] for val_batch in val_data: out = validation_step(val_batch) val_outs.append(out) validation_epoch_end(val_outs) - Parameters:
- batch – The output of your - DataLoader.
- batch_idx – The index of this batch. 
- dataloader_idx – The index of the dataloader that produced this batch. (only if multiple val dataloaders used) 
 
- Returns:
- Any object or value 
- None- Validation will skip to the next batch
 
 - # pseudocode of order val_outs = [] for val_batch in val_data: out = validation_step(val_batch) if defined("validation_step_end"): out = validation_step_end(out) val_outs.append(out) val_outs = validation_epoch_end(val_outs) - # if you have one val dataloader: def validation_step(self, batch, batch_idx): ... # if you have multiple val dataloaders: def validation_step(self, batch, batch_idx, dataloader_idx=0): ... - Examples: - # CASE 1: A single validation dataset def validation_step(self, batch, batch_idx): x, y = batch # implement your own out = self(x) loss = self.loss(out, y) # log 6 example images # or generated text... or whatever sample_imgs = x[:6] grid = torchvision.utils.make_grid(sample_imgs) self.logger.experiment.add_image('example_images', grid, 0) # calculate acc labels_hat = torch.argmax(out, dim=1) val_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0) # log the outputs! self.log_dict({'val_loss': loss, 'val_acc': val_acc}) - If you pass in multiple val dataloaders, - validation_step()will have an additional argument. We recommend setting the default value of 0 so that you can quickly switch between single and multiple dataloaders.- # CASE 2: multiple validation dataloaders def validation_step(self, batch, batch_idx, dataloader_idx=0): # dataloader_idx tells you which dataset this is. ... - Note - If you don’t need to validate you don’t need to implement this method. - Note - When the - validation_step()is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of validation, the model goes back to training mode and gradients are enabled.
 - validation_step_end(outputs)§
- Use this when validating with dp because - validation_step()will operate on only part of the batch. However, this is still optional and only needed for things like softmax or NCE loss.- Note - If you later switch to ddp or some other mode, this will still be called so that you don’t have to change your code. - # pseudocode sub_batches = split_batches_for_dp(batch) step_output = [validation_step(sub_batch) for sub_batch in sub_batches] validation_step_end(step_output) - Parameters:
- step_output – What you return in - validation_step()for each batch part.
- Returns:
- None or anything 
 - # WITHOUT validation_step_end # if used in DP, this batch is 1/num_gpus large def validation_step(self, batch, batch_idx): # batch is 1/num_gpus big x, y = batch out = self.encoder(x) loss = self.softmax(out) loss = nce_loss(loss) self.log("val_loss", loss) # -------------- # with validation_step_end to do softmax over the full batch def validation_step(self, batch, batch_idx): # batch is 1/num_gpus big x, y = batch out = self(x) return out def validation_step_end(self, val_step_outputs): for out in val_step_outputs: ... - See also - See the Multi GPU Training guide for more details. 
 
DKN§
- class src.models.DKN.DKN(learning_rate, num_words, word_embedding_dim, use_context, num_entities, entity_embedding_dim, num_filters, window_sizes, query_vector_dim, num_clicked_news_a_user, pretrained_word_embedding_path, pretrained_entity_embedding_path, pretrained_context_embedding_path, metrics={})§
- Deep knowledge-aware network. Input 1 + K candidate news and a list of user clicked news, produce the click probability. - configure_optimizers()§
- Choose what optimizers and learning-rate schedulers to use in your optimization. Normally you’d need one. But in the case of GANs or similar you might have multiple. - Returns:
- Any of these 6 options. - Single optimizer. 
- List or Tuple of optimizers. 
- Two lists - The first list has multiple optimizers, and the second has multiple LR schedulers (or multiple - lr_scheduler_config).
- Dictionary, with an - "optimizer"key, and (optionally) a- "lr_scheduler"key whose value is a single LR scheduler or- lr_scheduler_config.
- Tuple of dictionaries as described above, with an optional - "frequency"key.
- None - Fit will run without any optimizer. 
 
 - The - lr_scheduler_configis a dictionary which contains the scheduler and its associated configuration. The default configuration is shown below.- lr_scheduler_config = { # REQUIRED: The scheduler instance "scheduler": lr_scheduler, # The unit of the scheduler's step size, could also be 'step'. # 'epoch' updates the scheduler on epoch end whereas 'step' # updates it after a optimizer update. "interval": "epoch", # How many epochs/steps should pass between calls to # `scheduler.step()`. 1 corresponds to updating the learning # rate after every epoch/step. "frequency": 1, # Metric to to monitor for schedulers like `ReduceLROnPlateau` "monitor": "val_loss", # If set to `True`, will enforce that the value specified 'monitor' # is available when the scheduler is updated, thus stopping # training if not found. If set to `False`, it will only produce a warning "strict": True, # If using the `LearningRateMonitor` callback to monitor the # learning rate progress, this keyword can be used to specify # a custom logged name "name": None, } - When there are schedulers in which the - .step()method is conditioned on a value, such as the- torch.optim.lr_scheduler.ReduceLROnPlateauscheduler, Lightning requires that the- lr_scheduler_configcontains the keyword- "monitor"set to the metric name that the scheduler should be conditioned on.- Metrics can be made available to monitor by simply logging it using - self.log('metric_to_track', metric_val)in your- LightningModule.- Note - The - frequencyvalue specified in a dict along with the- optimizerkey is an int corresponding to the number of sequential batches optimized with the specific optimizer. It should be given to none or to all of the optimizers. There is a difference between passing multiple optimizers in a list, and passing multiple optimizers in dictionaries with a frequency of 1:- In the former case, all optimizers will operate on the given batch in each optimization step. 
- In the latter, only one optimizer will operate on the given batch at every step. 
 - This is different from the - frequencyvalue specified in the- lr_scheduler_configmentioned above.- def configure_optimizers(self): optimizer_one = torch.optim.SGD(self.model.parameters(), lr=0.01) optimizer_two = torch.optim.SGD(self.model.parameters(), lr=0.01) return [ {"optimizer": optimizer_one, "frequency": 5}, {"optimizer": optimizer_two, "frequency": 10}, ] - In this example, the first optimizer will be used for the first 5 steps, the second optimizer for the next 10 steps and that cycle will continue. If an LR scheduler is specified for an optimizer using the - lr_schedulerkey in the above dict, the scheduler will only be updated when its optimizer is being used.- Examples: - # most cases. no learning rate scheduler def configure_optimizers(self): return Adam(self.parameters(), lr=1e-3) # multiple optimizer case (e.g.: GAN) def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) return gen_opt, dis_opt # example with learning rate schedulers def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) dis_sch = CosineAnnealing(dis_opt, T_max=10) return [gen_opt, dis_opt], [dis_sch] # example with step-based learning rate schedulers # each optimizer has its own scheduler def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) gen_sch = { 'scheduler': ExponentialLR(gen_opt, 0.99), 'interval': 'step' # called after each training step } dis_sch = CosineAnnealing(dis_opt, T_max=10) # called every epoch return [gen_opt, dis_opt], [gen_sch, dis_sch] # example with optimizer frequencies # see training procedure in `Improved Training of Wasserstein GANs`, Algorithm 1 # https://arxiv.org/abs/1704.00028 def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) n_critic = 5 return ( {'optimizer': dis_opt, 'frequency': n_critic}, {'optimizer': gen_opt, 'frequency': 1} ) - Note - Some things to know: - Lightning calls - .backward()and- .step()on each optimizer as needed.
- If learning rate scheduler is specified in - configure_optimizers()with key- "interval"(default “epoch”) in the scheduler configuration, Lightning will call the scheduler’s- .step()method automatically in case of automatic optimization.
- If you use 16-bit precision ( - precision=16), Lightning will automatically handle the optimizers.
- If you use multiple optimizers, - training_step()will have an additional- optimizer_idxparameter.
- If you use - torch.optim.LBFGS, Lightning handles the closure function automatically for you.
- If you use multiple optimizers, gradients will be calculated only for the parameters of current optimizer at each training step. 
- If you need to control how often those optimizers step or override the default - .step()schedule, override the- optimizer_step()hook.
 
 - forward(candidate_news, clicked_news)§
- Parameters:
- candidate_news – - [
- {
- “title”: batch_size * num_words_title, “title_entities”: batch_size * num_words_title 
 - } * (1 + K) 
 - ] 
- clicked_news – - [
- {
- “title”: batch_size * num_words_title, “title_entities”: batch_size * num_words_title 
 - } * num_clicked_news_a_user 
 - ] 
 
- Returns:
- batch_size 
- Return type:
- click_probability 
 
 - get_news_vector(news)§
- Parameters:
- news – - {
- “title”: batch_size * num_words_title, “title_entities”: batch_size * num_words_title 
 - } 
- Returns:
- (shape) batch_size, len(window_sizes) * num_filters 
 
 - get_prediction(candidate_news_vector, clicked_news_vector)§
- Parameters:
- candidate_news_vector – candidate_size, len(window_sizes) * num_filters 
- clicked_news_vector – num_clicked_news_a_user, len(window_sizes) * num_filters 
 
- Returns:
- 0-dim tensor 
- Return type:
- click_probability 
 
 - get_user_vector(clicked_news_vector)§
- Parameters:
- clicked_news_vector – batch_size, num_clicked_news_a_user, len(window_sizes) * num_filters 
- Returns:
- (shape) batch_size, num_clicked_news_a_user, len(window_sizes) * num_filters 
 
 - on_test_start()§
- Called at the beginning of testing. 
 - on_train_start()§
- Called at the beginning of training after sanity check. 
 - on_validation_start()§
- Called at the beginning of validation. 
 - test_epoch_end(output)§
- Called at the end of a test epoch with the output of all test steps. - # the pseudocode for these calls test_outs = [] for test_batch in test_data: out = test_step(test_batch) test_outs.append(out) test_epoch_end(test_outs) - Parameters:
- outputs – List of outputs you defined in - test_step_end(), or if there are multiple dataloaders, a list containing a list of outputs for each dataloader
- Returns:
- None 
 - Note - If you didn’t define a - test_step(), this won’t be called.- Examples - With a single dataloader: - def test_epoch_end(self, outputs): # do something with the outputs of all test batches all_test_preds = test_step_outputs.predictions some_result = calc_all_results(all_test_preds) self.log(some_result) - With multiple dataloaders, outputs will be a list of lists. The outer list contains one entry per dataloader, while the inner list contains the individual outputs of each test step for that dataloader. - def test_epoch_end(self, outputs): final_value = 0 for dataloader_outputs in outputs: for test_step_out in dataloader_outputs: # do something final_value += test_step_out self.log("final_metric", final_value) 
 - test_step(batch, batch_idx)§
- Operates on a single batch of data from the test set. In this step you’d normally generate examples or calculate anything of interest such as accuracy. - # the pseudocode for these calls test_outs = [] for test_batch in test_data: out = test_step(test_batch) test_outs.append(out) test_epoch_end(test_outs) - Parameters:
- batch – The output of your - DataLoader.
- batch_idx – The index of this batch. 
- dataloader_id – The index of the dataloader that produced this batch. (only if multiple test dataloaders used). 
 
- Returns:
- Any of. - Any object or value 
- None- Testing will skip to the next batch
 
 - # if you have one test dataloader: def test_step(self, batch, batch_idx): ... # if you have multiple test dataloaders: def test_step(self, batch, batch_idx, dataloader_idx=0): ... - Examples: - # CASE 1: A single test dataset def test_step(self, batch, batch_idx): x, y = batch # implement your own out = self(x) loss = self.loss(out, y) # log 6 example images # or generated text... or whatever sample_imgs = x[:6] grid = torchvision.utils.make_grid(sample_imgs) self.logger.experiment.add_image('example_images', grid, 0) # calculate acc labels_hat = torch.argmax(out, dim=1) test_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0) # log the outputs! self.log_dict({'test_loss': loss, 'test_acc': test_acc}) - If you pass in multiple test dataloaders, - test_step()will have an additional argument. We recommend setting the default value of 0 so that you can quickly switch between single and multiple dataloaders.- # CASE 2: multiple test dataloaders def test_step(self, batch, batch_idx, dataloader_idx=0): # dataloader_idx tells you which dataset this is. ... - Note - If you don’t need to test you don’t need to implement this method. - Note - When the - test_step()is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of the test epoch, the model goes back to training mode and gradients are enabled.
 - training_step(batch, batch_idx)§
- Here you compute and return the training loss and some additional metrics for e.g. the progress bar or logger. - Parameters:
- batch ( - Tensor| (- Tensor, …) | [- Tensor, …]) – The output of your- DataLoader. A tensor, tuple or list.
- batch_idx ( - int) – Integer displaying index of this batch
- optimizer_idx ( - int) – When using multiple optimizers, this argument will also be present.
- hiddens ( - Any) – Passed in if :paramref:`~pytorch_lightning.core.module.LightningModule.truncated_bptt_steps` > 0.
 
- Returns:
- Any of. - Tensor- The loss tensor
- dict- A dictionary. Can include any keys, but must include the key- 'loss'
- None- Training will skip to the next batch. This is only for automatic optimization.
- This is not supported for multi-GPU, TPU, IPU, or DeepSpeed. 
 
 
 - In this step you’d normally do the forward pass and calculate the loss for a batch. You can also do fancier things like multiple forward passes or something model specific. - Example: - def training_step(self, batch, batch_idx): x, y, z = batch out = self.encoder(x) loss = self.loss(out, x) return loss - If you define multiple optimizers, this step will be called with an additional - optimizer_idxparameter.- # Multiple optimizers (e.g.: GANs) def training_step(self, batch, batch_idx, optimizer_idx): if optimizer_idx == 0: # do training_step with encoder ... if optimizer_idx == 1: # do training_step with decoder ... - If you add truncated back propagation through time you will also get an additional argument with the hidden states of the previous step. - # Truncated back-propagation through time def training_step(self, batch, batch_idx, hiddens): # hiddens are the hidden states from the previous truncated backprop step out, hiddens = self.lstm(data, hiddens) loss = ... return {"loss": loss, "hiddens": hiddens} - Note - The loss value shown in the progress bar is smoothed (averaged) over the last values, so it differs from the actual loss returned in train/validation step. 
 - validation_epoch_end(output)§
- Called at the end of the validation epoch with the outputs of all validation steps. - # the pseudocode for these calls val_outs = [] for val_batch in val_data: out = validation_step(val_batch) val_outs.append(out) validation_epoch_end(val_outs) - Parameters:
- outputs – List of outputs you defined in - validation_step(), or if there are multiple dataloaders, a list containing a list of outputs for each dataloader.
- Returns:
- None 
 - Note - If you didn’t define a - validation_step(), this won’t be called.- Examples - With a single dataloader: - def validation_epoch_end(self, val_step_outputs): for out in val_step_outputs: ... - With multiple dataloaders, outputs will be a list of lists. The outer list contains one entry per dataloader, while the inner list contains the individual outputs of each validation step for that dataloader. - def validation_epoch_end(self, outputs): for dataloader_output_result in outputs: dataloader_outs = dataloader_output_result.dataloader_i_outputs self.log("final_metric", final_value) 
 - validation_step(batch, batch_idx)§
- Operates on a single batch of data from the validation set. In this step you’d might generate examples or calculate anything of interest like accuracy. - # the pseudocode for these calls val_outs = [] for val_batch in val_data: out = validation_step(val_batch) val_outs.append(out) validation_epoch_end(val_outs) - Parameters:
- batch – The output of your - DataLoader.
- batch_idx – The index of this batch. 
- dataloader_idx – The index of the dataloader that produced this batch. (only if multiple val dataloaders used) 
 
- Returns:
- Any object or value 
- None- Validation will skip to the next batch
 
 - # pseudocode of order val_outs = [] for val_batch in val_data: out = validation_step(val_batch) if defined("validation_step_end"): out = validation_step_end(out) val_outs.append(out) val_outs = validation_epoch_end(val_outs) - # if you have one val dataloader: def validation_step(self, batch, batch_idx): ... # if you have multiple val dataloaders: def validation_step(self, batch, batch_idx, dataloader_idx=0): ... - Examples: - # CASE 1: A single validation dataset def validation_step(self, batch, batch_idx): x, y = batch # implement your own out = self(x) loss = self.loss(out, y) # log 6 example images # or generated text... or whatever sample_imgs = x[:6] grid = torchvision.utils.make_grid(sample_imgs) self.logger.experiment.add_image('example_images', grid, 0) # calculate acc labels_hat = torch.argmax(out, dim=1) val_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0) # log the outputs! self.log_dict({'val_loss': loss, 'val_acc': val_acc}) - If you pass in multiple val dataloaders, - validation_step()will have an additional argument. We recommend setting the default value of 0 so that you can quickly switch between single and multiple dataloaders.- # CASE 2: multiple validation dataloaders def validation_step(self, batch, batch_idx, dataloader_idx=0): # dataloader_idx tells you which dataset this is. ... - Note - If you don’t need to validate you don’t need to implement this method. - Note - When the - validation_step()is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of validation, the model goes back to training mode and gradients are enabled.
 
LSTUR§
- class src.models.LSTUR.LSTUR(learning_rate, num_users, num_words, word_embedding_dim, num_categories, num_filters, window_size, query_vector_dim, long_short_term_method, pretrained_word_embedding_path, masking_probability, dropout_probability, metrics={})§
- LSTUR network. Input 1 + K candidate news and a list of user clicked news, produce the click probability. - configure_optimizers()§
- Choose what optimizers and learning-rate schedulers to use in your optimization. Normally you’d need one. But in the case of GANs or similar you might have multiple. - Returns:
- Any of these 6 options. - Single optimizer. 
- List or Tuple of optimizers. 
- Two lists - The first list has multiple optimizers, and the second has multiple LR schedulers (or multiple - lr_scheduler_config).
- Dictionary, with an - "optimizer"key, and (optionally) a- "lr_scheduler"key whose value is a single LR scheduler or- lr_scheduler_config.
- Tuple of dictionaries as described above, with an optional - "frequency"key.
- None - Fit will run without any optimizer. 
 
 - The - lr_scheduler_configis a dictionary which contains the scheduler and its associated configuration. The default configuration is shown below.- lr_scheduler_config = { # REQUIRED: The scheduler instance "scheduler": lr_scheduler, # The unit of the scheduler's step size, could also be 'step'. # 'epoch' updates the scheduler on epoch end whereas 'step' # updates it after a optimizer update. "interval": "epoch", # How many epochs/steps should pass between calls to # `scheduler.step()`. 1 corresponds to updating the learning # rate after every epoch/step. "frequency": 1, # Metric to to monitor for schedulers like `ReduceLROnPlateau` "monitor": "val_loss", # If set to `True`, will enforce that the value specified 'monitor' # is available when the scheduler is updated, thus stopping # training if not found. If set to `False`, it will only produce a warning "strict": True, # If using the `LearningRateMonitor` callback to monitor the # learning rate progress, this keyword can be used to specify # a custom logged name "name": None, } - When there are schedulers in which the - .step()method is conditioned on a value, such as the- torch.optim.lr_scheduler.ReduceLROnPlateauscheduler, Lightning requires that the- lr_scheduler_configcontains the keyword- "monitor"set to the metric name that the scheduler should be conditioned on.- Metrics can be made available to monitor by simply logging it using - self.log('metric_to_track', metric_val)in your- LightningModule.- Note - The - frequencyvalue specified in a dict along with the- optimizerkey is an int corresponding to the number of sequential batches optimized with the specific optimizer. It should be given to none or to all of the optimizers. There is a difference between passing multiple optimizers in a list, and passing multiple optimizers in dictionaries with a frequency of 1:- In the former case, all optimizers will operate on the given batch in each optimization step. 
- In the latter, only one optimizer will operate on the given batch at every step. 
 - This is different from the - frequencyvalue specified in the- lr_scheduler_configmentioned above.- def configure_optimizers(self): optimizer_one = torch.optim.SGD(self.model.parameters(), lr=0.01) optimizer_two = torch.optim.SGD(self.model.parameters(), lr=0.01) return [ {"optimizer": optimizer_one, "frequency": 5}, {"optimizer": optimizer_two, "frequency": 10}, ] - In this example, the first optimizer will be used for the first 5 steps, the second optimizer for the next 10 steps and that cycle will continue. If an LR scheduler is specified for an optimizer using the - lr_schedulerkey in the above dict, the scheduler will only be updated when its optimizer is being used.- Examples: - # most cases. no learning rate scheduler def configure_optimizers(self): return Adam(self.parameters(), lr=1e-3) # multiple optimizer case (e.g.: GAN) def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) return gen_opt, dis_opt # example with learning rate schedulers def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) dis_sch = CosineAnnealing(dis_opt, T_max=10) return [gen_opt, dis_opt], [dis_sch] # example with step-based learning rate schedulers # each optimizer has its own scheduler def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) gen_sch = { 'scheduler': ExponentialLR(gen_opt, 0.99), 'interval': 'step' # called after each training step } dis_sch = CosineAnnealing(dis_opt, T_max=10) # called every epoch return [gen_opt, dis_opt], [gen_sch, dis_sch] # example with optimizer frequencies # see training procedure in `Improved Training of Wasserstein GANs`, Algorithm 1 # https://arxiv.org/abs/1704.00028 def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) n_critic = 5 return ( {'optimizer': dis_opt, 'frequency': n_critic}, {'optimizer': gen_opt, 'frequency': 1} ) - Note - Some things to know: - Lightning calls - .backward()and- .step()on each optimizer as needed.
- If learning rate scheduler is specified in - configure_optimizers()with key- "interval"(default “epoch”) in the scheduler configuration, Lightning will call the scheduler’s- .step()method automatically in case of automatic optimization.
- If you use 16-bit precision ( - precision=16), Lightning will automatically handle the optimizers.
- If you use multiple optimizers, - training_step()will have an additional- optimizer_idxparameter.
- If you use - torch.optim.LBFGS, Lightning handles the closure function automatically for you.
- If you use multiple optimizers, gradients will be calculated only for the parameters of current optimizer at each training step. 
- If you need to control how often those optimizers step or override the default - .step()schedule, override the- optimizer_step()hook.
 
 - forward(user, clicked_news_length, candidate_news, clicked_news)§
- Parameters:
- user – batch_size, 
- clicked_news_length – batch_size, 
- candidate_news – - [
- {
- “category”: batch_size, “subcategory”: batch_size, “title”: batch_size * num_words_title 
 - } * (1 + K) 
 - ] 
- clicked_news – - [
- {
- “category”: batch_size, “subcategory”: batch_size, “title”: batch_size * num_words_title 
 - } * num_clicked_news_a_user 
 - ] 
 
- Returns:
- batch_size 
- Return type:
- click_probability 
 
 - get_prediction(news_vector, user_vector)§
- Parameters:
- news_vector – candidate_size, word_embedding_dim 
- user_vector – word_embedding_dim 
 
- Returns:
- candidate_size 
- Return type:
- click_probability 
 
 - get_user_vector(user, clicked_news_length, clicked_news_vector)§
- Parameters:
- user – batch_size 
- clicked_news_length – batch_size 
- clicked_news_vector – batch_size, num_clicked_news_a_user, num_filters * 3 
 
- Returns:
- (shape) batch_size, num_filters * 3 
 
 - on_test_start()§
- Called at the beginning of testing. 
 - on_train_start()§
- Called at the beginning of training after sanity check. 
 - on_validation_start()§
- Called at the beginning of validation. 
 - test_epoch_end(output)§
- Called at the end of a test epoch with the output of all test steps. - # the pseudocode for these calls test_outs = [] for test_batch in test_data: out = test_step(test_batch) test_outs.append(out) test_epoch_end(test_outs) - Parameters:
- outputs – List of outputs you defined in - test_step_end(), or if there are multiple dataloaders, a list containing a list of outputs for each dataloader
- Returns:
- None 
 - Note - If you didn’t define a - test_step(), this won’t be called.- Examples - With a single dataloader: - def test_epoch_end(self, outputs): # do something with the outputs of all test batches all_test_preds = test_step_outputs.predictions some_result = calc_all_results(all_test_preds) self.log(some_result) - With multiple dataloaders, outputs will be a list of lists. The outer list contains one entry per dataloader, while the inner list contains the individual outputs of each test step for that dataloader. - def test_epoch_end(self, outputs): final_value = 0 for dataloader_outputs in outputs: for test_step_out in dataloader_outputs: # do something final_value += test_step_out self.log("final_metric", final_value) 
 - test_step(batch, batch_idx)§
- Operates on a single batch of data from the test set. In this step you’d normally generate examples or calculate anything of interest such as accuracy. - # the pseudocode for these calls test_outs = [] for test_batch in test_data: out = test_step(test_batch) test_outs.append(out) test_epoch_end(test_outs) - Parameters:
- batch – The output of your - DataLoader.
- batch_idx – The index of this batch. 
- dataloader_id – The index of the dataloader that produced this batch. (only if multiple test dataloaders used). 
 
- Returns:
- Any of. - Any object or value 
- None- Testing will skip to the next batch
 
 - # if you have one test dataloader: def test_step(self, batch, batch_idx): ... # if you have multiple test dataloaders: def test_step(self, batch, batch_idx, dataloader_idx=0): ... - Examples: - # CASE 1: A single test dataset def test_step(self, batch, batch_idx): x, y = batch # implement your own out = self(x) loss = self.loss(out, y) # log 6 example images # or generated text... or whatever sample_imgs = x[:6] grid = torchvision.utils.make_grid(sample_imgs) self.logger.experiment.add_image('example_images', grid, 0) # calculate acc labels_hat = torch.argmax(out, dim=1) test_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0) # log the outputs! self.log_dict({'test_loss': loss, 'test_acc': test_acc}) - If you pass in multiple test dataloaders, - test_step()will have an additional argument. We recommend setting the default value of 0 so that you can quickly switch between single and multiple dataloaders.- # CASE 2: multiple test dataloaders def test_step(self, batch, batch_idx, dataloader_idx=0): # dataloader_idx tells you which dataset this is. ... - Note - If you don’t need to test you don’t need to implement this method. - Note - When the - test_step()is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of the test epoch, the model goes back to training mode and gradients are enabled.
 - training_step(batch, batch_idx)§
- Here you compute and return the training loss and some additional metrics for e.g. the progress bar or logger. - Parameters:
- batch ( - Tensor| (- Tensor, …) | [- Tensor, …]) – The output of your- DataLoader. A tensor, tuple or list.
- batch_idx ( - int) – Integer displaying index of this batch
- optimizer_idx ( - int) – When using multiple optimizers, this argument will also be present.
- hiddens ( - Any) – Passed in if :paramref:`~pytorch_lightning.core.module.LightningModule.truncated_bptt_steps` > 0.
 
- Returns:
- Any of. - Tensor- The loss tensor
- dict- A dictionary. Can include any keys, but must include the key- 'loss'
- None- Training will skip to the next batch. This is only for automatic optimization.
- This is not supported for multi-GPU, TPU, IPU, or DeepSpeed. 
 
 
 - In this step you’d normally do the forward pass and calculate the loss for a batch. You can also do fancier things like multiple forward passes or something model specific. - Example: - def training_step(self, batch, batch_idx): x, y, z = batch out = self.encoder(x) loss = self.loss(out, x) return loss - If you define multiple optimizers, this step will be called with an additional - optimizer_idxparameter.- # Multiple optimizers (e.g.: GANs) def training_step(self, batch, batch_idx, optimizer_idx): if optimizer_idx == 0: # do training_step with encoder ... if optimizer_idx == 1: # do training_step with decoder ... - If you add truncated back propagation through time you will also get an additional argument with the hidden states of the previous step. - # Truncated back-propagation through time def training_step(self, batch, batch_idx, hiddens): # hiddens are the hidden states from the previous truncated backprop step out, hiddens = self.lstm(data, hiddens) loss = ... return {"loss": loss, "hiddens": hiddens} - Note - The loss value shown in the progress bar is smoothed (averaged) over the last values, so it differs from the actual loss returned in train/validation step. 
 - validation_epoch_end(output)§
- Called at the end of the validation epoch with the outputs of all validation steps. - # the pseudocode for these calls val_outs = [] for val_batch in val_data: out = validation_step(val_batch) val_outs.append(out) validation_epoch_end(val_outs) - Parameters:
- outputs – List of outputs you defined in - validation_step(), or if there are multiple dataloaders, a list containing a list of outputs for each dataloader.
- Returns:
- None 
 - Note - If you didn’t define a - validation_step(), this won’t be called.- Examples - With a single dataloader: - def validation_epoch_end(self, val_step_outputs): for out in val_step_outputs: ... - With multiple dataloaders, outputs will be a list of lists. The outer list contains one entry per dataloader, while the inner list contains the individual outputs of each validation step for that dataloader. - def validation_epoch_end(self, outputs): for dataloader_output_result in outputs: dataloader_outs = dataloader_output_result.dataloader_i_outputs self.log("final_metric", final_value) 
 - validation_step(batch, batch_idx)§
- Operates on a single batch of data from the validation set. In this step you’d might generate examples or calculate anything of interest like accuracy. - # the pseudocode for these calls val_outs = [] for val_batch in val_data: out = validation_step(val_batch) val_outs.append(out) validation_epoch_end(val_outs) - Parameters:
- batch – The output of your - DataLoader.
- batch_idx – The index of this batch. 
- dataloader_idx – The index of the dataloader that produced this batch. (only if multiple val dataloaders used) 
 
- Returns:
- Any object or value 
- None- Validation will skip to the next batch
 
 - # pseudocode of order val_outs = [] for val_batch in val_data: out = validation_step(val_batch) if defined("validation_step_end"): out = validation_step_end(out) val_outs.append(out) val_outs = validation_epoch_end(val_outs) - # if you have one val dataloader: def validation_step(self, batch, batch_idx): ... # if you have multiple val dataloaders: def validation_step(self, batch, batch_idx, dataloader_idx=0): ... - Examples: - # CASE 1: A single validation dataset def validation_step(self, batch, batch_idx): x, y = batch # implement your own out = self(x) loss = self.loss(out, y) # log 6 example images # or generated text... or whatever sample_imgs = x[:6] grid = torchvision.utils.make_grid(sample_imgs) self.logger.experiment.add_image('example_images', grid, 0) # calculate acc labels_hat = torch.argmax(out, dim=1) val_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0) # log the outputs! self.log_dict({'val_loss': loss, 'val_acc': val_acc}) - If you pass in multiple val dataloaders, - validation_step()will have an additional argument. We recommend setting the default value of 0 so that you can quickly switch between single and multiple dataloaders.- # CASE 2: multiple validation dataloaders def validation_step(self, batch, batch_idx, dataloader_idx=0): # dataloader_idx tells you which dataset this is. ... - Note - If you don’t need to validate you don’t need to implement this method. - Note - When the - validation_step()is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of validation, the model goes back to training mode and gradients are enabled.
 
NAML§
- class src.models.naml.NAML(lr, word_embedding_dim, dataset_attributes, num_filters, window_size, query_vector_dim, dropout_probability, category_embedding_dim, num_words, num_categories, pretrained_word_embedding=None, metrics={})§
- NAML: Neural News Recommendation with Attentive Multi-View Learning - NAML is a multi-view news recommendation approach. The core of NAML is a news encoder and a user encoder. The newsencoder is composed of a title encoder, a body encoder, a vert encoder and a subvert encoder. The CNN-based title encoder and body encoder learn title and body representations by capturing words semantic information. After getting news title, body, vert and subvert representations, an attention network is used to aggregate those vectors. In the user encoder, we learn representations of users from their browsed news. Besides, we apply additive attention to learn more informative news and user representations by selecting important words and news. - Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang and Xing Xie: Neural News Recommendation with Attentive Multi-View Learning, IJCAI 2019 - Based on implementation from https://github.com/yusanshi/news-recommendation - Parameters:
- lr (float) – Learning rate 
- word_embedding_dim (int) – Dimension for word embeddings 
- dataset_attributes (dict) – The NAML model requires the following parameters {news: [category, subcategory, title, abstract], record: []} 
- num_filters (int) – Number of filters 
- window_size (int) – Size of window 
- query_vector_dim (int) – Dimension of query vector 
- dropout_probability (float) – Probability for dropout layer 
- category_embedding_dim (int) – Dimension for category embeddings 
- num_words (int) – Number of words 
- num_categories (int) – Number of categories 
- pretrained_word_embedding (Optional[str]) – Path to pretrained word embeddings 
- metrics (dict) – Torchmetrics style classes to be used for evaluation 
 
 - configure_optimizers()§
- Choose what optimizers and learning-rate schedulers to use in your optimization. Normally you’d need one. But in the case of GANs or similar you might have multiple. - Returns:
- Any of these 6 options. - Single optimizer. 
- List or Tuple of optimizers. 
- Two lists - The first list has multiple optimizers, and the second has multiple LR schedulers (or multiple - lr_scheduler_config).
- Dictionary, with an - "optimizer"key, and (optionally) a- "lr_scheduler"key whose value is a single LR scheduler or- lr_scheduler_config.
- Tuple of dictionaries as described above, with an optional - "frequency"key.
- None - Fit will run without any optimizer. 
 
 - The - lr_scheduler_configis a dictionary which contains the scheduler and its associated configuration. The default configuration is shown below.- lr_scheduler_config = { # REQUIRED: The scheduler instance "scheduler": lr_scheduler, # The unit of the scheduler's step size, could also be 'step'. # 'epoch' updates the scheduler on epoch end whereas 'step' # updates it after a optimizer update. "interval": "epoch", # How many epochs/steps should pass between calls to # `scheduler.step()`. 1 corresponds to updating the learning # rate after every epoch/step. "frequency": 1, # Metric to to monitor for schedulers like `ReduceLROnPlateau` "monitor": "val_loss", # If set to `True`, will enforce that the value specified 'monitor' # is available when the scheduler is updated, thus stopping # training if not found. If set to `False`, it will only produce a warning "strict": True, # If using the `LearningRateMonitor` callback to monitor the # learning rate progress, this keyword can be used to specify # a custom logged name "name": None, } - When there are schedulers in which the - .step()method is conditioned on a value, such as the- torch.optim.lr_scheduler.ReduceLROnPlateauscheduler, Lightning requires that the- lr_scheduler_configcontains the keyword- "monitor"set to the metric name that the scheduler should be conditioned on.- Metrics can be made available to monitor by simply logging it using - self.log('metric_to_track', metric_val)in your- LightningModule.- Note - The - frequencyvalue specified in a dict along with the- optimizerkey is an int corresponding to the number of sequential batches optimized with the specific optimizer. It should be given to none or to all of the optimizers. There is a difference between passing multiple optimizers in a list, and passing multiple optimizers in dictionaries with a frequency of 1:- In the former case, all optimizers will operate on the given batch in each optimization step. 
- In the latter, only one optimizer will operate on the given batch at every step. 
 - This is different from the - frequencyvalue specified in the- lr_scheduler_configmentioned above.- def configure_optimizers(self): optimizer_one = torch.optim.SGD(self.model.parameters(), lr=0.01) optimizer_two = torch.optim.SGD(self.model.parameters(), lr=0.01) return [ {"optimizer": optimizer_one, "frequency": 5}, {"optimizer": optimizer_two, "frequency": 10}, ] - In this example, the first optimizer will be used for the first 5 steps, the second optimizer for the next 10 steps and that cycle will continue. If an LR scheduler is specified for an optimizer using the - lr_schedulerkey in the above dict, the scheduler will only be updated when its optimizer is being used.- Examples: - # most cases. no learning rate scheduler def configure_optimizers(self): return Adam(self.parameters(), lr=1e-3) # multiple optimizer case (e.g.: GAN) def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) return gen_opt, dis_opt # example with learning rate schedulers def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) dis_sch = CosineAnnealing(dis_opt, T_max=10) return [gen_opt, dis_opt], [dis_sch] # example with step-based learning rate schedulers # each optimizer has its own scheduler def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) gen_sch = { 'scheduler': ExponentialLR(gen_opt, 0.99), 'interval': 'step' # called after each training step } dis_sch = CosineAnnealing(dis_opt, T_max=10) # called every epoch return [gen_opt, dis_opt], [gen_sch, dis_sch] # example with optimizer frequencies # see training procedure in `Improved Training of Wasserstein GANs`, Algorithm 1 # https://arxiv.org/abs/1704.00028 def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) n_critic = 5 return ( {'optimizer': dis_opt, 'frequency': n_critic}, {'optimizer': gen_opt, 'frequency': 1} ) - Note - Some things to know: - Lightning calls - .backward()and- .step()on each optimizer as needed.
- If learning rate scheduler is specified in - configure_optimizers()with key- "interval"(default “epoch”) in the scheduler configuration, Lightning will call the scheduler’s- .step()method automatically in case of automatic optimization.
- If you use 16-bit precision ( - precision=16), Lightning will automatically handle the optimizers.
- If you use multiple optimizers, - training_step()will have an additional- optimizer_idxparameter.
- If you use - torch.optim.LBFGS, Lightning handles the closure function automatically for you.
- If you use multiple optimizers, gradients will be calculated only for the parameters of current optimizer at each training step. 
- If you need to control how often those optimizers step or override the default - .step()schedule, override the- optimizer_step()hook.
 
 - forward(candidate_news, clicked_news)§
- Parameters:
- candidate_news – - [
- {
- “category”: batch_size, “subcategory”: batch_size, “title”: batch_size * num_words_title, “abstract”: batch_size * num_words_abstract 
 - } * (1 + K) 
 - ] 
- clicked_news – - [
- {
- “category”: batch_size, “subcategory”: batch_size, “title”: batch_size * num_words_title, “abstract”: batch_size * num_words_abstract 
 - } * num_clicked_news_a_user 
 - ] 
 
- Returns:
- batch_size 
- Return type:
- click_probability 
 
 - get_news_vector(news)§
- Parameters:
- news – - {
- “category”: batch_size, “subcategory”: batch_size, “title”: batch_size * num_words_title, “abstract”: batch_size * num_words_abstract 
 - } 
- Returns:
- (shape) batch_size, num_filters 
 
 - get_prediction(news_vector, user_vector)§
- Parameters:
- news_vector – candidate_size, word_embedding_dim 
- user_vector – word_embedding_dim 
 
- Returns:
- candidate_size 
- Return type:
- click_probability 
 
 - get_user_vector(clicked_news_vector)§
- Parameters:
- clicked_news_vector – batch_size, num_clicked_news_a_user, num_filters 
- Returns:
- (shape) batch_size, num_filters 
 
 - on_test_start()§
- Called at the beginning of testing. 
 - on_train_start()§
- Called at the beginning of training after sanity check. 
 - on_validation_start()§
- Called at the beginning of validation. 
 - test_epoch_end(output)§
- Called at the end of a test epoch with the output of all test steps. - # the pseudocode for these calls test_outs = [] for test_batch in test_data: out = test_step(test_batch) test_outs.append(out) test_epoch_end(test_outs) - Parameters:
- outputs – List of outputs you defined in - test_step_end(), or if there are multiple dataloaders, a list containing a list of outputs for each dataloader
- Returns:
- None 
 - Note - If you didn’t define a - test_step(), this won’t be called.- Examples - With a single dataloader: - def test_epoch_end(self, outputs): # do something with the outputs of all test batches all_test_preds = test_step_outputs.predictions some_result = calc_all_results(all_test_preds) self.log(some_result) - With multiple dataloaders, outputs will be a list of lists. The outer list contains one entry per dataloader, while the inner list contains the individual outputs of each test step for that dataloader. - def test_epoch_end(self, outputs): final_value = 0 for dataloader_outputs in outputs: for test_step_out in dataloader_outputs: # do something final_value += test_step_out self.log("final_metric", final_value) 
 - test_step(batch, batch_idx)§
- Operates on a single batch of data from the test set. In this step you’d normally generate examples or calculate anything of interest such as accuracy. - # the pseudocode for these calls test_outs = [] for test_batch in test_data: out = test_step(test_batch) test_outs.append(out) test_epoch_end(test_outs) - Parameters:
- batch – The output of your - DataLoader.
- batch_idx – The index of this batch. 
- dataloader_id – The index of the dataloader that produced this batch. (only if multiple test dataloaders used). 
 
- Returns:
- Any of. - Any object or value 
- None- Testing will skip to the next batch
 
 - # if you have one test dataloader: def test_step(self, batch, batch_idx): ... # if you have multiple test dataloaders: def test_step(self, batch, batch_idx, dataloader_idx=0): ... - Examples: - # CASE 1: A single test dataset def test_step(self, batch, batch_idx): x, y = batch # implement your own out = self(x) loss = self.loss(out, y) # log 6 example images # or generated text... or whatever sample_imgs = x[:6] grid = torchvision.utils.make_grid(sample_imgs) self.logger.experiment.add_image('example_images', grid, 0) # calculate acc labels_hat = torch.argmax(out, dim=1) test_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0) # log the outputs! self.log_dict({'test_loss': loss, 'test_acc': test_acc}) - If you pass in multiple test dataloaders, - test_step()will have an additional argument. We recommend setting the default value of 0 so that you can quickly switch between single and multiple dataloaders.- # CASE 2: multiple test dataloaders def test_step(self, batch, batch_idx, dataloader_idx=0): # dataloader_idx tells you which dataset this is. ... - Note - If you don’t need to test you don’t need to implement this method. - Note - When the - test_step()is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of the test epoch, the model goes back to training mode and gradients are enabled.
 - training_step(batch, batch_idx)§
- Here you compute and return the training loss and some additional metrics for e.g. the progress bar or logger. - Parameters:
- batch ( - Tensor| (- Tensor, …) | [- Tensor, …]) – The output of your- DataLoader. A tensor, tuple or list.
- batch_idx ( - int) – Integer displaying index of this batch
- optimizer_idx ( - int) – When using multiple optimizers, this argument will also be present.
- hiddens ( - Any) – Passed in if :paramref:`~pytorch_lightning.core.module.LightningModule.truncated_bptt_steps` > 0.
 
- Returns:
- Any of. - Tensor- The loss tensor
- dict- A dictionary. Can include any keys, but must include the key- 'loss'
- None- Training will skip to the next batch. This is only for automatic optimization.
- This is not supported for multi-GPU, TPU, IPU, or DeepSpeed. 
 
 
 - In this step you’d normally do the forward pass and calculate the loss for a batch. You can also do fancier things like multiple forward passes or something model specific. - Example: - def training_step(self, batch, batch_idx): x, y, z = batch out = self.encoder(x) loss = self.loss(out, x) return loss - If you define multiple optimizers, this step will be called with an additional - optimizer_idxparameter.- # Multiple optimizers (e.g.: GANs) def training_step(self, batch, batch_idx, optimizer_idx): if optimizer_idx == 0: # do training_step with encoder ... if optimizer_idx == 1: # do training_step with decoder ... - If you add truncated back propagation through time you will also get an additional argument with the hidden states of the previous step. - # Truncated back-propagation through time def training_step(self, batch, batch_idx, hiddens): # hiddens are the hidden states from the previous truncated backprop step out, hiddens = self.lstm(data, hiddens) loss = ... return {"loss": loss, "hiddens": hiddens} - Note - The loss value shown in the progress bar is smoothed (averaged) over the last values, so it differs from the actual loss returned in train/validation step. 
 - validation_epoch_end(output)§
- Called at the end of the validation epoch with the outputs of all validation steps. - # the pseudocode for these calls val_outs = [] for val_batch in val_data: out = validation_step(val_batch) val_outs.append(out) validation_epoch_end(val_outs) - Parameters:
- outputs – List of outputs you defined in - validation_step(), or if there are multiple dataloaders, a list containing a list of outputs for each dataloader.
- Returns:
- None 
 - Note - If you didn’t define a - validation_step(), this won’t be called.- Examples - With a single dataloader: - def validation_epoch_end(self, val_step_outputs): for out in val_step_outputs: ... - With multiple dataloaders, outputs will be a list of lists. The outer list contains one entry per dataloader, while the inner list contains the individual outputs of each validation step for that dataloader. - def validation_epoch_end(self, outputs): for dataloader_output_result in outputs: dataloader_outs = dataloader_output_result.dataloader_i_outputs self.log("final_metric", final_value) 
 - validation_step(batch, batch_idx)§
- Operates on a single batch of data from the validation set. In this step you’d might generate examples or calculate anything of interest like accuracy. - # the pseudocode for these calls val_outs = [] for val_batch in val_data: out = validation_step(val_batch) val_outs.append(out) validation_epoch_end(val_outs) - Parameters:
- batch – The output of your - DataLoader.
- batch_idx – The index of this batch. 
- dataloader_idx – The index of the dataloader that produced this batch. (only if multiple val dataloaders used) 
 
- Returns:
- Any object or value 
- None- Validation will skip to the next batch
 
 - # pseudocode of order val_outs = [] for val_batch in val_data: out = validation_step(val_batch) if defined("validation_step_end"): out = validation_step_end(out) val_outs.append(out) val_outs = validation_epoch_end(val_outs) - # if you have one val dataloader: def validation_step(self, batch, batch_idx): ... # if you have multiple val dataloaders: def validation_step(self, batch, batch_idx, dataloader_idx=0): ... - Examples: - # CASE 1: A single validation dataset def validation_step(self, batch, batch_idx): x, y = batch # implement your own out = self(x) loss = self.loss(out, y) # log 6 example images # or generated text... or whatever sample_imgs = x[:6] grid = torchvision.utils.make_grid(sample_imgs) self.logger.experiment.add_image('example_images', grid, 0) # calculate acc labels_hat = torch.argmax(out, dim=1) val_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0) # log the outputs! self.log_dict({'val_loss': loss, 'val_acc': val_acc}) - If you pass in multiple val dataloaders, - validation_step()will have an additional argument. We recommend setting the default value of 0 so that you can quickly switch between single and multiple dataloaders.- # CASE 2: multiple validation dataloaders def validation_step(self, batch, batch_idx, dataloader_idx=0): # dataloader_idx tells you which dataset this is. ... - Note - If you don’t need to validate you don’t need to implement this method. - Note - When the - validation_step()is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of validation, the model goes back to training mode and gradients are enabled.
 
NRMS§
- class src.models.nrms.NRMS(lr, num_words, word_embedding_dim, num_attention_heads, query_vector_dim, dropout_probability, pretrained_word_embedding=None, metrics={})§
- NRMS is a neural news recommendation approach with multi-head selfattention. The core of NRMS is a news encoder and a user encoder. In the newsencoder, a multi-head self-attentions is used to learn news representations from news titles by modeling the interactions between words. In the user encoder, we learn representations of users from their browsed news and use multihead self-attention to capture the relatedness between the news. Besides, we apply additive attention to learn more informative news and user representations by selecting important words and news. - Wu et al. “Neural News Recommendation with Multi-Head Self-Attention.” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) - Based on implementation from https://github.com/yusanshi/news-recommendation - Parameters:
- lr (float) – Learning rate 
- num_words (int) – Number of words 
- word_embedding_dim (int) – Dimension for word embeddings 
- num_attention_heads (int) – Number of attention heads 
- query_vector_dim (int) – Dimension of query vector 
- dropout_probability (float) – Probability for dropout layer 
- pretrained_word_embedding (Optional[str]) – Path to pretrained word embeddings 
- metrics (dict) – Torchmetrics style classes to be used for evaluation 
 
 - configure_optimizers()§
- Choose what optimizers and learning-rate schedulers to use in your optimization. Normally you’d need one. But in the case of GANs or similar you might have multiple. - Returns:
- Any of these 6 options. - Single optimizer. 
- List or Tuple of optimizers. 
- Two lists - The first list has multiple optimizers, and the second has multiple LR schedulers (or multiple - lr_scheduler_config).
- Dictionary, with an - "optimizer"key, and (optionally) a- "lr_scheduler"key whose value is a single LR scheduler or- lr_scheduler_config.
- Tuple of dictionaries as described above, with an optional - "frequency"key.
- None - Fit will run without any optimizer. 
 
 - The - lr_scheduler_configis a dictionary which contains the scheduler and its associated configuration. The default configuration is shown below.- lr_scheduler_config = { # REQUIRED: The scheduler instance "scheduler": lr_scheduler, # The unit of the scheduler's step size, could also be 'step'. # 'epoch' updates the scheduler on epoch end whereas 'step' # updates it after a optimizer update. "interval": "epoch", # How many epochs/steps should pass between calls to # `scheduler.step()`. 1 corresponds to updating the learning # rate after every epoch/step. "frequency": 1, # Metric to to monitor for schedulers like `ReduceLROnPlateau` "monitor": "val_loss", # If set to `True`, will enforce that the value specified 'monitor' # is available when the scheduler is updated, thus stopping # training if not found. If set to `False`, it will only produce a warning "strict": True, # If using the `LearningRateMonitor` callback to monitor the # learning rate progress, this keyword can be used to specify # a custom logged name "name": None, } - When there are schedulers in which the - .step()method is conditioned on a value, such as the- torch.optim.lr_scheduler.ReduceLROnPlateauscheduler, Lightning requires that the- lr_scheduler_configcontains the keyword- "monitor"set to the metric name that the scheduler should be conditioned on.- Metrics can be made available to monitor by simply logging it using - self.log('metric_to_track', metric_val)in your- LightningModule.- Note - The - frequencyvalue specified in a dict along with the- optimizerkey is an int corresponding to the number of sequential batches optimized with the specific optimizer. It should be given to none or to all of the optimizers. There is a difference between passing multiple optimizers in a list, and passing multiple optimizers in dictionaries with a frequency of 1:- In the former case, all optimizers will operate on the given batch in each optimization step. 
- In the latter, only one optimizer will operate on the given batch at every step. 
 - This is different from the - frequencyvalue specified in the- lr_scheduler_configmentioned above.- def configure_optimizers(self): optimizer_one = torch.optim.SGD(self.model.parameters(), lr=0.01) optimizer_two = torch.optim.SGD(self.model.parameters(), lr=0.01) return [ {"optimizer": optimizer_one, "frequency": 5}, {"optimizer": optimizer_two, "frequency": 10}, ] - In this example, the first optimizer will be used for the first 5 steps, the second optimizer for the next 10 steps and that cycle will continue. If an LR scheduler is specified for an optimizer using the - lr_schedulerkey in the above dict, the scheduler will only be updated when its optimizer is being used.- Examples: - # most cases. no learning rate scheduler def configure_optimizers(self): return Adam(self.parameters(), lr=1e-3) # multiple optimizer case (e.g.: GAN) def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) return gen_opt, dis_opt # example with learning rate schedulers def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) dis_sch = CosineAnnealing(dis_opt, T_max=10) return [gen_opt, dis_opt], [dis_sch] # example with step-based learning rate schedulers # each optimizer has its own scheduler def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) gen_sch = { 'scheduler': ExponentialLR(gen_opt, 0.99), 'interval': 'step' # called after each training step } dis_sch = CosineAnnealing(dis_opt, T_max=10) # called every epoch return [gen_opt, dis_opt], [gen_sch, dis_sch] # example with optimizer frequencies # see training procedure in `Improved Training of Wasserstein GANs`, Algorithm 1 # https://arxiv.org/abs/1704.00028 def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_dis.parameters(), lr=0.02) n_critic = 5 return ( {'optimizer': dis_opt, 'frequency': n_critic}, {'optimizer': gen_opt, 'frequency': 1} ) - Note - Some things to know: - Lightning calls - .backward()and- .step()on each optimizer as needed.
- If learning rate scheduler is specified in - configure_optimizers()with key- "interval"(default “epoch”) in the scheduler configuration, Lightning will call the scheduler’s- .step()method automatically in case of automatic optimization.
- If you use 16-bit precision ( - precision=16), Lightning will automatically handle the optimizers.
- If you use multiple optimizers, - training_step()will have an additional- optimizer_idxparameter.
- If you use - torch.optim.LBFGS, Lightning handles the closure function automatically for you.
- If you use multiple optimizers, gradients will be calculated only for the parameters of current optimizer at each training step. 
- If you need to control how often those optimizers step or override the default - .step()schedule, override the- optimizer_step()hook.
 
 - forward(candidate_news, clicked_news)§
- Parameters:
- candidate_news – - [
- {
- “title”: batch_size * num_words_title 
 - } * (1 + K) 
 - ] 
- clicked_news – - [
- {
- “title”:batch_size * num_words_title 
 - } * num_clicked_news_a_user 
 - ] 
 
- Returns:
- batch_size, 1 + K 
- Return type:
- click_probability 
 
 - get_news_vector(news)§
- Parameters:
- news – - {
- “title”: batch_size * num_words_title 
 - }, 
- Returns:
- (shape) batch_size, word_embedding_dim 
 
 - get_prediction(news_vector, user_vector)§
- Parameters:
- news_vector – candidate_size, word_embedding_dim 
- user_vector – word_embedding_dim 
 
- Returns:
- candidate_size 
- Return type:
- click_probability 
 
 - get_user_vector(clicked_news_vector)§
- Parameters:
- clicked_news_vector – batch_size, num_clicked_news_a_user, word_embedding_dim 
- Returns:
- (shape) batch_size, word_embedding_dim 
 
 - on_test_start()§
- Called at the beginning of testing. 
 - on_train_start()§
- Called at the beginning of training after sanity check. 
 - on_validation_start()§
- Called at the beginning of validation. 
 - test_epoch_end(output)§
- Called at the end of a test epoch with the output of all test steps. - # the pseudocode for these calls test_outs = [] for test_batch in test_data: out = test_step(test_batch) test_outs.append(out) test_epoch_end(test_outs) - Parameters:
- outputs – List of outputs you defined in - test_step_end(), or if there are multiple dataloaders, a list containing a list of outputs for each dataloader
- Returns:
- None 
 - Note - If you didn’t define a - test_step(), this won’t be called.- Examples - With a single dataloader: - def test_epoch_end(self, outputs): # do something with the outputs of all test batches all_test_preds = test_step_outputs.predictions some_result = calc_all_results(all_test_preds) self.log(some_result) - With multiple dataloaders, outputs will be a list of lists. The outer list contains one entry per dataloader, while the inner list contains the individual outputs of each test step for that dataloader. - def test_epoch_end(self, outputs): final_value = 0 for dataloader_outputs in outputs: for test_step_out in dataloader_outputs: # do something final_value += test_step_out self.log("final_metric", final_value) 
 - test_step(batch, batch_idx)§
- Operates on a single batch of data from the test set. In this step you’d normally generate examples or calculate anything of interest such as accuracy. - # the pseudocode for these calls test_outs = [] for test_batch in test_data: out = test_step(test_batch) test_outs.append(out) test_epoch_end(test_outs) - Parameters:
- batch – The output of your - DataLoader.
- batch_idx – The index of this batch. 
- dataloader_id – The index of the dataloader that produced this batch. (only if multiple test dataloaders used). 
 
- Returns:
- Any of. - Any object or value 
- None- Testing will skip to the next batch
 
 - # if you have one test dataloader: def test_step(self, batch, batch_idx): ... # if you have multiple test dataloaders: def test_step(self, batch, batch_idx, dataloader_idx=0): ... - Examples: - # CASE 1: A single test dataset def test_step(self, batch, batch_idx): x, y = batch # implement your own out = self(x) loss = self.loss(out, y) # log 6 example images # or generated text... or whatever sample_imgs = x[:6] grid = torchvision.utils.make_grid(sample_imgs) self.logger.experiment.add_image('example_images', grid, 0) # calculate acc labels_hat = torch.argmax(out, dim=1) test_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0) # log the outputs! self.log_dict({'test_loss': loss, 'test_acc': test_acc}) - If you pass in multiple test dataloaders, - test_step()will have an additional argument. We recommend setting the default value of 0 so that you can quickly switch between single and multiple dataloaders.- # CASE 2: multiple test dataloaders def test_step(self, batch, batch_idx, dataloader_idx=0): # dataloader_idx tells you which dataset this is. ... - Note - If you don’t need to test you don’t need to implement this method. - Note - When the - test_step()is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of the test epoch, the model goes back to training mode and gradients are enabled.
 - training_step(batch, batch_idx)§
- Here you compute and return the training loss and some additional metrics for e.g. the progress bar or logger. - Parameters:
- batch ( - Tensor| (- Tensor, …) | [- Tensor, …]) – The output of your- DataLoader. A tensor, tuple or list.
- batch_idx ( - int) – Integer displaying index of this batch
- optimizer_idx ( - int) – When using multiple optimizers, this argument will also be present.
- hiddens ( - Any) – Passed in if :paramref:`~pytorch_lightning.core.module.LightningModule.truncated_bptt_steps` > 0.
 
- Returns:
- Any of. - Tensor- The loss tensor
- dict- A dictionary. Can include any keys, but must include the key- 'loss'
- None- Training will skip to the next batch. This is only for automatic optimization.
- This is not supported for multi-GPU, TPU, IPU, or DeepSpeed. 
 
 
 - In this step you’d normally do the forward pass and calculate the loss for a batch. You can also do fancier things like multiple forward passes or something model specific. - Example: - def training_step(self, batch, batch_idx): x, y, z = batch out = self.encoder(x) loss = self.loss(out, x) return loss - If you define multiple optimizers, this step will be called with an additional - optimizer_idxparameter.- # Multiple optimizers (e.g.: GANs) def training_step(self, batch, batch_idx, optimizer_idx): if optimizer_idx == 0: # do training_step with encoder ... if optimizer_idx == 1: # do training_step with decoder ... - If you add truncated back propagation through time you will also get an additional argument with the hidden states of the previous step. - # Truncated back-propagation through time def training_step(self, batch, batch_idx, hiddens): # hiddens are the hidden states from the previous truncated backprop step out, hiddens = self.lstm(data, hiddens) loss = ... return {"loss": loss, "hiddens": hiddens} - Note - The loss value shown in the progress bar is smoothed (averaged) over the last values, so it differs from the actual loss returned in train/validation step. 
 - validation_epoch_end(output)§
- Called at the end of the validation epoch with the outputs of all validation steps. - # the pseudocode for these calls val_outs = [] for val_batch in val_data: out = validation_step(val_batch) val_outs.append(out) validation_epoch_end(val_outs) - Parameters:
- outputs – List of outputs you defined in - validation_step(), or if there are multiple dataloaders, a list containing a list of outputs for each dataloader.
- Returns:
- None 
 - Note - If you didn’t define a - validation_step(), this won’t be called.- Examples - With a single dataloader: - def validation_epoch_end(self, val_step_outputs): for out in val_step_outputs: ... - With multiple dataloaders, outputs will be a list of lists. The outer list contains one entry per dataloader, while the inner list contains the individual outputs of each validation step for that dataloader. - def validation_epoch_end(self, outputs): for dataloader_output_result in outputs: dataloader_outs = dataloader_output_result.dataloader_i_outputs self.log("final_metric", final_value) 
 - validation_step(batch, batch_idx)§
- Operates on a single batch of data from the validation set. In this step you’d might generate examples or calculate anything of interest like accuracy. - # the pseudocode for these calls val_outs = [] for val_batch in val_data: out = validation_step(val_batch) val_outs.append(out) validation_epoch_end(val_outs) - Parameters:
- batch – The output of your - DataLoader.
- batch_idx – The index of this batch. 
- dataloader_idx – The index of the dataloader that produced this batch. (only if multiple val dataloaders used) 
 
- Returns:
- Any object or value 
- None- Validation will skip to the next batch
 
 - # pseudocode of order val_outs = [] for val_batch in val_data: out = validation_step(val_batch) if defined("validation_step_end"): out = validation_step_end(out) val_outs.append(out) val_outs = validation_epoch_end(val_outs) - # if you have one val dataloader: def validation_step(self, batch, batch_idx): ... # if you have multiple val dataloaders: def validation_step(self, batch, batch_idx, dataloader_idx=0): ... - Examples: - # CASE 1: A single validation dataset def validation_step(self, batch, batch_idx): x, y = batch # implement your own out = self(x) loss = self.loss(out, y) # log 6 example images # or generated text... or whatever sample_imgs = x[:6] grid = torchvision.utils.make_grid(sample_imgs) self.logger.experiment.add_image('example_images', grid, 0) # calculate acc labels_hat = torch.argmax(out, dim=1) val_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0) # log the outputs! self.log_dict({'val_loss': loss, 'val_acc': val_acc}) - If you pass in multiple val dataloaders, - validation_step()will have an additional argument. We recommend setting the default value of 0 so that you can quickly switch between single and multiple dataloaders.- # CASE 2: multiple validation dataloaders def validation_step(self, batch, batch_idx, dataloader_idx=0): # dataloader_idx tells you which dataset this is. ... - Note - If you don’t need to validate you don’t need to implement this method. - Note - When the - validation_step()is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of validation, the model goes back to training mode and gradients are enabled.