Moreover, the directory structure of the disk environment may not always be easy to understand.
csv files, results output, etc) that erases each time your session ends. For example, Google CoLab uses a non-persistent disk to store your artifacts (e.g. The downside to all these more or less “managed” options, means that there are aspects to each platform that may be difficult to navigate. Data science platforms such as DataBricks, AWS Sagemaker, or Google CoLab all offer managed data science environments that give data scientists access to more robust computing resources. So much so, that it can get rather confusing on where to focus. Unfortunately, for data science students today, these cloud resources also come with lots of options for leveraging more robust compute environments. Fortunately, today with the advent of cloud technologies we can leverage significantly larger compute resources for limited uses at a fraction of the cost it would require purchasing a high-performing laptop. Laptops with more RAM, larger discs, and faster chips. In the not-too-distant past this meant that we needed to purchase better hardware. During this same time, those same data scientists often also begin to experiment with more sophisticated models like deep learning models.Īs our datasets get larger and our models more sophisticated, so too do the compute resources required increase. There often comes a time in every data scientist’s journey where she/he/they have moved past learning the fundamentals of data science and begin to apply newly acquired knowledge to bigger and bigger datasets.