llm.preprocess.utils
Preprocessing script utilities.
readable_to_bytes
¶
Convert string with bytes units to the integer value of bytes.
Source: ProxyStore
Parameters:
-
size
(str
) –String to parse for bytes size.
Returns:
-
int
–Integer number of bytes parsed from the string.
Raises:
-
ValueError
–If the input string contains more than two parts (i.e., a value and a unit).
-
ValueError
–If the unit is not one of KB, MB, GB, TB, KiB, MiB, GiB, or TiB.
-
ValueError
–If the value cannot be cast to a float.
Source code in llm/preprocess/utils.py
safe_extract
¶
Safely extract a tar file.
Note
This extraction method is designed to safeguard against CVE-2007-4559.
Parameters: