These log elements can be combined together into a single object, helping with parsing and aggregating them. For example, we could define a class in Python code in the following way:
class PriceLog(object):
def __init__(self, timestamp, product_id, price):
self.timestamp = timestamp
self.product_id = product_id
self.price = price
def __repr__(self):
return '<PriceLog ({}, {}, {})>'.format(self.timestamp,
self.product_id,
self.price)
@classmethod
def parse(cls, text_log):
'''
Parse from a text log with the format
[<Timestamp>] - SALE - PRODUCT: <product id> - PRICE: $<price>
to a PriceLog object
'''
divide_it = text_log.split(' - ')
tmp_string, _, product_string, price_string = divide_it
timestamp = delorean.parse(tmp_string.strip('[]'))
product_id = int(product_string.split(':')[-1])
price = Decimal(price_string.split('$')[-1])
return cls(timestamp=timestamp, product_id=product_id, price=price)
So, the parsing can be done as follows:
>>> log = '[2018-05-05T12:58:59.998903] - SALE - PRODUCT: 897 - PRICE: $17.99'
>>> PriceLog.parse(log)
<PriceLog (Delorean(datetime=datetime.datetime(2018, 5, 5, 12, 58, 59, 998903), timezone='UTC'), 897, 17.99)>
Avoid using float types for prices. Floats numbers have precision problems that may produce strange errors when aggregating multiple prices, for example:
>>> 0.1 + 0.1 + 0.1
0.30000000000000004
Try these two options to avoid problems:
- Use integer cents as the base unit: This means multiplying currency inputs by 100 and transforming them into integers (or whatever fractional unit is correct for the currency used). You may still want to change the base when displaying them.
- Parse into the Decimal type: The Decimal type keeps the fixed precision and works as you'd expect. You can find further information about the Decimal type in the Python docs at https://docs.python.org/3.6/library/decimal.html.
If you use the Decimal type, parse the results directly into Decimal from the string. If transforming it first into a float, you can carry the precision errors to the new type.