Learn Before
Recommended rules for methods
For String-based methods:
Scientific Notation works better than Decimal Notation Character Level Tokenization performs better that Subword Level Tokenization.
For Real-based methods:
Log scale works better than Linear scale. Binning like Dense Cross Entropy Loss works better than Continuous Value Prediction. Binning the distributions based on precision level is better than modeling continuous predictions.
Encoding and Decoding:
Value Embedding can be used to encode and decode numbers.DICE Encoder can be used for encoding which is not easily decodable.
Mixing multiple methods:
Ensembling embeddings could be helpful sometimes. Some methods like Exponent embeddings ensembled with DigitRNN barely outperforms the standalone exponent embeddings.
0
1
Tags
Natural language processing
Data Science