A Data Driven Future

There is a growing awareness of the power of data to be a lever with which humanity can change the world for better. This is serious stuff. Save lives. Heal our planet. Generate immense real economic and improve the quality of life of nearly every living human being. The application of data science to the rising tide of data inundating our lives is an important part of achieving these benefits for humanity.

All segments of all industries must rapidly adapt to a data driven economy. People and companies that bury their heads in the sand will only to fall behind with greater velocity over time. Technological change is not only happening faster, it's velocity of change is increasing and thus the magnitude of its impact, positive and negative, in ever shorter time cycles. Feel like things are changing faster? They are changing faster.

When it comes to data and the science we want to do with that data, there is no one pill to cure all our ills. There is no easy button even though that is what everyone mistakenly wants. To learn how do use data to change the world, we need to understand it’s core properties.

There are four things to know about data that stand out as especially important.

Data is an Asset. It is like real estate, a bond, cash in the bank or an automotive production plant. When you begin to consider data in this way, it changes how you choose to manage it over time. It changes how you assign value to data.

Data is Digital. Data is unique in many ways because it is digital. This gives it, generally speaking, a low marginal cost of replication and reuse. This can be a blessing or a curse depending on how this little detail is managed and used as a lever to generate value from data.

Data is Strategically Valuable. As a digital asset, data is strategically valuable to the on-going concern of any entity be they a person or an organizations. This just seems obvious, and gladly, it’s becoming more so now that we have the actual tools becoming available more broadly to use the data we can generate.

Data is Dynamic - Data will never stop changing. Even if you just leave it alone it'll rot or age. Some data has a very short half-life of value. Other data, like a fine wine, just gets better with age (and integration). If your data management systems do not account for this, you'll end up with a data cesspool instead of a pristine and beautiful data lake.

Understand these four key properties of data. Consider the implications of five Billion people online, connected and communicating. Remember that data can be used to change the world. Think. Then, Do Good Things With Data for People.

There will be between 50 and 200 Billion devices connected to the internet by around 2020. At the high end, that’s as many as 25 devices for every person that will be alive at that time. Today, that number is closer to three for every living person. This is currently called the Internet of Things. People are a crucial component of these things. People don’t like being called things and that is understandable. Like it or not though, people are part of the IoT.

Over half of humanity is not online yet! But, they will be very soon. Right now there are about 3.15 Billion people online in various ways. That will be over five Billion in the next three to five years.

Access to massive data sets and humanities ability to use them effectively is astounding. There are algorithms being created that are learning to do things that only humans could do before and they are learning them in very human-like ways in some cases. They are usually carefully trained and parented by a loving data scientist still today.

The subjects touched on lightly in this post are beginnings.

---

About the Author
Kent Langley is the CEO/CTO, Ekho, Inc. and Faculty in Data Science at Singularity University. Kent advises companies and frequently acts as a Chief Technical Advisor to the business and technology executives providing for Technology audit, due diligence, technical architecture and filling leadership roles on-demand. Kent is also an ExO expert helping companies adopt new people, processes and technologies that enable them to leverage resources effectively and grow.

About Ekho
Ekho is a company that endeavors to deliver on the Massive Transformative Purpose (MTP) to do good things with data for people. Ultimately, Ekho helps its clients derive actionable insights from their data using the best data science tools and processes available.

About Singularity University
Singularity University provides educational programs, innovative partnerships and a startup accelerator to help individuals, businesses, institutions, investors, NGOs and governments understand cutting-edge technologies, and how to utilize these technologies to positively impact billions of people.

Back.

I'm back and will be blogging again. I have been building a startup. I have been teaching. My family has grown. Collectively, that has been rather time consuming. Look here for more on what's been up and what I'm up to as I get situated.

Best,

Kent

Factoring Complexity

For many years now when building scalable and highly available computing infrastructures I've been doing something I call Factoring Complexity. I think a lot of this is very well served by using agile project methods and ITSM (such as ITIL) concepts.  There may be better names for this but I'll stick with it for now.

I do this to achieve the least common denominator systems architecture to efficiently provide the required business service. This includes processes, people, computers, and code. To understand what I mean be this and how I approach two cross-discipline definitions are required. I borrowed these from some of my math classes from long ago.

From www.algebrahelp.com a definition for factoring reads as follows:

Factoring is an important process in algebra which is used to simplify expressions, simplify fractions, and solve equations.

I typically aim to Factor complexity out of infrastructure and systems or software in general for the lowest number of interconnections between components that is able to reliably perform the task at hand, like serve a website, that provides appropriate scalability and performance.

In the context of building web systems Factoring Complexity serves several purposes. Two that stand out are resource application (time and money) and maintainability. From a business perspective it is important to use the appropriate level of resources to solve any particular business problem. From a practical point of view, things should be maintainable.  The successful output of a round of Factoring Complexity is usually less connections between components. It is the least complex system that can adequately provide for any particular business need.

From algebrahelp.com a definition for factoring reads as follows:

Factoring is an important process in algebra which is used to simplify expressions, simplify fractions, and solve equations.

From wikipedia the least common denominator is seen to be defined as follows:

Q: The term is used figuratively to refer to the "lowest"—least useful, least advanced, or similar—member of a class or set which is common to things that relate to members of that class.

When attempting to factor complexity I strive to focus on a few key things. They are documentation, relationships, and advance planning. I'll take each of these in turn. But first, let me show a very simple and visual example of what I am talking about.

Figure 1 is a rather complex system with a lot of interconnections. Figure 2 is a less complex system with many less interconnections. Figure 3 is a very simple system with only 1 interconnection. I illustrate this in this way to explain something that is often missed. The complexity is not as much in the nodes themselves but in how they interact and interconnect. The complexity increases dramatically every single time you add a node. Nodes can be software programs, development frameworks, servers, people, network connections, anything really; anything that might interact in some way with some other node.

In the context of refactoring existing web infrastructure, one way to attack overly complex systems is to focus on the interconnections. If you can begin to eliminate the need for interconnections without negatively impacting performance, scalability, availability, and capacity then you have done an excellent thing. You have reduced the overall complexity of the system while maintaining or even improving the systems manageability and capability as a whole. Sometimes people don't believe this is possible when they first see the concept. But, I assure you that is because I have personally used this concept successfully many times professionally and personally. One might equate the number of interconnections to the level of overall effectiveness of any given environment.

Documentation is a critical factor in getting any environment under control and getting to a position of relative stability and predictability. In the context of Factoring Complexity I am primary talking about first documenting all of the known and discoverable components of a giving system as much as possible and at an appropriate level of detail. Then, even more importantly, documenting the relationships of one component to another in as much detail as possible. These relationships are very important for the next steps.

If you want to get started factoring complexity in your compute environments then there are three key things to document. One, physical components. Two, abstract services. Three, relationships of these items. In ITIL these are called CI's, Configuration Items. The relationships between CI's is what we are looking for here. The are tracked in a CMDB. CMDB is a nasty four letter word in some places because they can be very challenging to implement. The CMDB, or Configuration Management DataBase, is in essence a social graph. There are some great pieces of software finally emerging that can easily handle the kinds of complexity and number of interconnections that many IT environments present. One of the more interesting ones to me is Neo4j. It is, in the sites words,

Neo4j is a graph database. It is an embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables. A graph (mathematical lingo for a network) is a flexible data structure that allows a more agile and rapid style of development.

But, that's all for now, how to make Neo4j into a killer CMDB is just a idea for now as far as I know. It makes a lot of sense though doesn't it?

How Far Does the Addressable Cloud Reach?

I was recently impressed by this entry on wikipedia about Voyager 1. It said, “Operating for 34 years, 6 months and 30 days as of today, the spacecraft receives routine commands and transmits data back.” (source link). This too is part of the cloud now. Our cloud now reaches out approximate 1.8×1010 km away from wherever you happen to be sitting right now beyond our own solar system and into the heliosheath. Courtesy of the Voyager 1 mission and some rather forward thinking scientists we have vastly expanded the human cloud of information and data in at least this important way.

The B612 Foundation is working to put satellites in orbit around the sun, there is already an interplanetary internet between earth, moon, and mars.

Via networks of micro satellites and drones (which we already have technically) we will have real time high definition video anywhere on earth at the push of a button. You can already, for a modest price relatively speaking, put your own micro satellite in orbit. All of these devices are connected or will be connected to the cloud.

The cloud does not just exist in a data center somewhere. It’s in your pocket. It’s on your TV. It’s in space. DARPA is working on a mobile smart-phone based private cloud.

The cloud is not Facebook, Apple, Amazon or Google. Those companies are major influencers in the cloud of course and brilliant in so many ways. But, they are just the beginning.

It’s cloud all the way to the edge and the edge boundary is expanding very quickly.