By AI Trends Staff
Data science is a fairly new field, so we have all kinds of advice coming out for how to do the job right. As AI is being rolled out inside organizations, the communication of business strategy incorporating AI and inevitably data science, becomes crucial.
With the advance of machine learning in financial systems, enormous amounts of data can be stored, analyzed, calculated and interpreted, without explicit programming.
Aspiring data scientists with a good understanding of the relevant AI technology need to understand the business objectives, then proceed with data validation. A guide to appropriate machine learning applications in finance was recently compiled by DataFlair, an online training provider.
The list includes: Security, since machine learning algorithms need only split seconds to assess transactions, putting them in a position to spot fraud in real time; Financial Monitoring, able to detect and flag a large number of micropayments, for example, making it useful for spotting money laundering; Fraud Detection, enabled by spotting patterns and using predictive analytics on high volumes of data to block fraudulent transactions; Investment Predictions, rely on machine learning to identify market changes earlier than previously possible; Robo Advisors for portfolio management and recommendation of financial products, for optimizing the client’s assets; and Algorithmic Training, pre-programmed trading taking into account many variables, using predictions and analyzing historic behavior to determine an optimal market strategy.
Bright young data scientists may head into their careers thinking they can do no wrong, when in fact they might not be seeing the forest for the algorithms. Dan Becker, team lead for Kaggle Learn, recently posted a piece on KDnuggets on what 70% of data science learners do wrong. (Note: Kaggle is an online community of data scientists and machine learners, owned by Google LLC.)
“Corporate data science is still a new field,” he stated, noting that many teachers in academia have not worked on real problems for real businesses, so they may be teaching textbook algorithms in a way separate from reality and the business context. “This can be intellectually fun. But, students are mistaken if they assume these courses prepare them well to work as data scientists.”
His suggested guidelines for focusing your efforts on practically important skills include the following:
Use standard open source libraries. These libraries are well-documented, well-tested and have well-designed APIs.
Spend time manipulating your data into the format you need. Many projects involve much data manipulation and little model tuning. Many job candidates can describe algorithms, but are lacking in Pandas skills. (Note: Pandas are short, hands-on challenges to help develop data manipulation skills.)
Learn about techniques in the context of applications. “If you need technical jargon to describe the practical relevance of what you are learning, you probably aren’t ready to apply it.,” Becker states.
Learn how to interpret model output. You need to understand measures of accuracy, to know if you can trust a model. Learn machine learning explainability techniques, such as permutation importance. (Note: Permutation feature importance is a model inspection technique.)
Build projects in a domain you find interesting. The single most important skill might be the ability to share your work, to interpret and discuss the results.
Tools Needed to Support Machine Learning Workflows
A new set of tools is required to support machine learning workflows, advises a recent account in AnalyticsIndiaMag written by Mathangi Sri, leader of the data science team at PhonePe, supplier of an Indian e-commerce payment system.
Most organizations are separating deployment outside of the data scientist’s role, but the needed hardware depends on the complexity of the problem. Teams often start with a good configuration of CPU machines in the cloud. The hardware deployment platform need to match the types of data science problems to be solved. GPUs are needed for certain classes of problems, while CPUs can handle other.
How the Data Science team is organized is important. If the infrastructure is not right, the team could be underutilizing the power of predictions and the available human capital.
The silicon providers are building more end-to-end hardware and software solutions to keep pace with changing AI workload requirements. Models are getting more complex and more inference is happening at the edge.
With this in mind, the writer likes the Intel enhanced 2nd Gen Intel® Xeon® Scalable processors to give good flexibility for both AI and the vast range of data-centric workloads. 2nd Gen Intel® Xeon® Scalable processors promise an AI acceleration push, coupled with Intel® DL Boost – tailored for deep learning inferencing.