The Austrian entity of a world-wide telecommunications company involved us in their Big Data transformational initiative. While running their IT operations on traditional databases, data warehouses and enterprise analytics, they realised that those tools could be upgraded to handle more data with fewer resources, more efficiently and lower costs, mainly by switching to open-source software. We contributed to the following two use cases by leveraging our Big Data expertise.
The first use case was to replace an existing solution built with expensive vendor solutions (Oracle, SAS) with open-source alternatives, also to improve the data science modelling capabilities and efficiency. The legacy data pipeline started with a data warehouse in which customer data was stored. Then these data was loaded into the analytics solution which had a proprietary heuristics and algorithms to identify potential customers who were about to leave the company. When such customers are identified, the company can take steps to retain those customers, hence to reduce churn. These actions usually are personalised marketing offers, giving the customer exclusive deals and discounts. Also, by looking at these steps and identifying what is leading to churn, the business can make the right steps to reduce that in the future by aligning strategy, product and services offerings.
We replaced the legacy data pipeline with a new one based on Big Data technologies, allowing real-time streaming to downstream systems and a data lake. Their private cloud installation hosted a Hadoop based data lake on which Apache Spark jobs were running the analytics. With help of black box testing (for the same input the same output should be provided regardless what is in the box) a team of data scientist created an open-source version of the churn algorithm, mimicking the behaviour of the vendor based one, using random decision forests in R. Later this prototype was translated to a production ready Scala based version.
As a result, the client could cancel their licence with the analytics vendor, saved a large sum of cost and achieved a self-service notebook based data science tool in which the model and the surrounding use cases can be handled flexibly and easily.
A mobile cellular networks two essential components are the handheld devices and the radio towers. Handheld devices communicate with wireless radio transmission with the towers which are wired to the mobile networks core infrastructure. Urban areas are densely populated with radio towers, and rural regions need to be sufficiently covered as well for a proper service. Radio towers emit a large amount of data in real-time. Planning and maintaining such a cellular network is not an easy task. Radio towers continuously need to be updated with latest technologies, parts need to be replaced before failure for consistent service availability, and hotspots need to be identified in future also to sustain the level of the service. This requires to store large amounts of data produced by the radio towers, combining this with geospatial information (maps, terrain, objects) to understand the present.
The key to unlocking real benefits for this use case was not just to understand the present but to be able to predict the future by looking at patterns in the past and having reasonable theories about the future. To be able to do that historical data needs to be retained from all the radio towers and needs to be analysed. This is a classic Big Data problem as that amount of data can only be stored in a data centre, and efficient processing requires several computers with sizeable computational capacity working on it simultaneously. Once patterns of previous hardware failures identified those can be projected to the future, allowing to have accurate estimates around upcoming issues. Preventing these proactively, instead of acting on faults after they happened already, increases the level of the service, reduces operational cost, achieving predictive maintenance. This scenario is good for maintaining the current status of the network, often this is not enough.
With new technologies, new tower locations need to be included, existing ones to be abandoned or moved for better coverage, responding to the requirements of new technologies and changes in real-life (i.e. new buildings, the area becoming more densely populated, etc.). Unlocking this insight is also possible by having historical data, however these need to be combined with additional datasets such as geospatial data, urban planning forecasts.
Predictive analytics, algorithms, models and techniques can solve these challenges.
Apache Spark, Hadoop, Kafka, Hive, Oozie,