Currently I'm developing an application which would ingest logs of order of 70-80 GB of data/day and would then do Some analysis on them
Now the infrastructure that I have is a 4 node cluster( all nodes on Virtual Machines) , all nodes have 4GB ram.
But when I try to run the dataset (which is a sample dataset at this point ) of about 30 GB, it takes about 3 hrs to process all of it.
I would like to know is it normal for this kind of infrastructure to take this amount of time.