Skip to main content

Posts

Big Data analysis using Apache Spark

 The Dendory Capital Datalab creates solutions, workflows, tools and pipelines for our clients' Big Data needs. As part of this process, we need to make sure the solutions we provide our clients work properly. Today, we're going to go over a simple use case of analyzing a dataset to gain useful insights about a particular problem. We're going to load a CSV file containing data from the NASA Near Earth Objects project, and try to find out whether or not any large object is going to come close to the planet in the next week. Singing up for Databricks Databricks is the commercial version of Apache Spark, and provides a handy web-based interface to create and manage clusters, start a notebook, and use Python code without any administration overhead. Better yet, they have a community edition we're going to be able to use for free. So the first thing to do is go to databricks.com and signing up for a community account. Once you confirm your email address, you can log into
Recent posts

Notes taking software reviewed

Taking notes is something that anyone who works with intellectual matters has to do. You typically start in high school, when the things you learn are no longer trivial and can no longer be retained by heart just by hearing them, and it goes on throughout a lifetime for many of us. A lot of what makes a good engineer or scientist isn't what they can remember, but how good they are at finding the answer, whether that's by referring to your notes from previous problems you've solved, searching for the answer through experiments, or just Googling for it. If you want to retain that knowledge and not waste time when you need to do something that's close enough to something you've touched in the past, having a well organizing notes taking system is crucial. Over the years I've used a lot of tools myself, and I will review what I consider 3 of the best and most popular options here. Apple Notes If you use an iPhone, like many of us do, the default notes app that comes

Some big data terminology

This year I've helped a lot of clients with their big data projects, and it's likely that anyone that works in DevOps or even as a regular IT person will have to deal with big data in the coming years. Businesses rely more and more on data analytics to make decisions that wouldn't have been possible before. Whether it's an insurance company adjusting rates based on real time car data streaming in, a security company alerting their agents automatically when something suspicious is detected on one of their many surveillance systems, or even a small business trying to gain more insights from web traffic, big data is everywhere. But before you can deal with big data, you need to know some of the common terms being used, what they refer to, and how they typically apply within the enterprise. This will allow you to successfully engage with the different stakeholders and make sure everyone is on the same page, so projects don't over-promise and under-deliver. Types of data

Making a simple traffic analytics page

Big data has become a very important part of doing business, but there's one particular type of data that every business has deal with, and that's web site traffic data. Web analytics, as it's commonly known, is crucial in positioning a web site because it shows you where your users came from, what kind of device they use, their browser type, and whether they clicked on an ad to get to your site. This is all important data to know if you want to run a successful online business. In the early days of the web, there used to be a lot of different analytics options, but these days it's safe to say that the vast majority of sites use Google Analytics. This is because it's well known, integrates with other Google products, and is supported out of the box in many web site builders, making it easy to implement. However, I think it has two big flaws. The first is that it's difficult to get actionable data from it. Over the years, many additional features were added, but

Granting temporary access on an AWS account

One of the many tasks that a cloud consultant may have to do is access a client's cloud environment. Whether you're having someone create some resources in your AWS network, or hiring an auditor to review your security posture, you need to grant them temporary access in order to allow them to do their job, while restricting what they can do based on the type of work they were hired to do. This means you shouldn't trust them with the root account. Instead, let's see how you can do this securely within just a few minutes. Creating an IAM user In order to login to the AWS console, they will need a username and password. To create one, log into your administrator account, click on Services at the top, and use the search function to go to the IAM page:   There, click on Users on the left side, and then the Add user button: On the first page, you will have to give your new user a name. Make sure to select something that will be obvious who this user is. Then, you have to s

What is Operations?

Technology is at the center of most businesses these days, and when an entrepreneur launches a startup, the need to hire a software developer, full stack engineer, or other type of similar employee as part of the team seems like a natural thing. But the Operations side, or what is commonly referred to as IT, is a less obvious prospect. After all, these days anyone can go to one of many hosting providers and click a few buttons, and they'll have a web site. If you don't have a physical office, then that means no need for IT, right? So what is exactly involved in the Operations side of things, and why is it a crucial part of any business? Security When you deploy an infrastructure in the cloud, you have to understand the separation of roles. The cloud provider, whether you're talking about a platform like AWS which allows you to create entire VMs, or a site like WordPress which hosts your web site directly, will only handle their own side of the infrastructure. This includes

The Cloud Audit - Something every business needs

  Every business is now a technology company in some way, and just like computers made it into the office a few decades ago, now the cloud is becoming a staple for almost every business. Having worked in many, many client environments, the move to the cloud typically always happens the same way. A department or project manager decides to launch their cloud adoption, which leads to an Azure or AWS tenant being created. Some instances are provisioned, network routes setup, and the deployment grows organically from there. The result of such organic growth tends to be disorganization. Naming conventions are brought in later on, tools are decided on the fly, and even if you use infrastructure-as-code, I've seen many cases where one group is using Terraform, while another is using CloudFormation, and of course there's the break-fix instances where people go into the web console and change things manually. Almost no company older than a couple of years old out there doesn't have