Category: 02. Disadvantage
-
Contextual Information Loss
If sequences are padded to a fixed length that is too short, you might lose important contextual information from longer sequences.
-
Memory Usage
Padding sequences can result in higher memory usage, as you’re effectively increasing the memory footprint of your data. This can become an issue when working with large datasets.
-
Impact on Model Performance
In some cases, padding can introduce noise or disrupt the natural sequence characteristics, potentially affecting the model’s ability to capture important patterns in the data.
-
Hyperparameter Sensitivity
Choosing the appropriate padding length can be challenging. Padding too much might lead to unnecessary computation, while padding too little might result in loss of information. This introduces a hyperparameter that needs to be tuned carefully.
-
Increased Computation
When padding sequences to a fixed length, the added zeros or tokens contribute to the computation load during training and inference, potentially slowing down the process.
-
Loss of Information
Padding sequences with zeros or special tokens can introduce extra information into the data. While this might not be a concern in some cases, it can impact the performance of models when padding is excessive.