Introduction to Probability and Statistics

Code School Level Credits Semesters
DATA1004 Computer Science 1 N/A Spring UK
Code
DATA1004
School
Computer Science
Level
1
Credits
N/A
Semesters
Spring UK

Summary

This teaching block provides an introduction to probability and statistics required to underpin further statistical analysis in data science. The module introduces the mathematical framework for the logic of uncertainty using Bayes’ Theorem and demonstrates how it can be applied to analyse data and base conclusions on it (with particular reference to the Central Limit Theorem). A range of statistical ideas and skills are also developed, building on the foundations of probability preceding this. In particular, this will involve modelling the data using probability models and estimating important parameters or using techniques such as hypothesis testing to draw conclusions about that data and linear regression to estimate trends. Use is made of an appropriate statistical package to apply the principles and methods described in the material.

Target Students

Must be studying towards Data Scientist Degree Apprenticeship.

Classes

18-week teaching block 2 hours per week of e-learning online content (7 in total – 14 hours) 3 hours per week of e-learning online content (8 in total – 24 hours) 1-hour online workshop session (3 in total - 3 hours) 1-hour lab workshop to introduce statistical software R (1-hour online workshop) 5 hours block release day 4 hours block release day

Assessment

Assessed by end of designated period

Educational Aims

The purpose of this module is toprovide an introduction toprobability,probabilistic reasoning, andan introduction to statistical inference.It will also equip Apprentices with a good grounding in practical data analysis as well as the ability to use a computer package and write a report.Apprenticeswill acquire knowledge and skills of relevance to a professional statistician.

Learning Outcomes

KSBs

K3. How data can be used systematically, through an awareness of key platforms for data and analysis in an organisation, including:

  1. Data processing and storage, including on-premise and cloud technologies.
  2. Database systems including relational, data warehousing & online analytical processing, "NoSQL" and real-time approaches, the pros and cons of each approach.
  3. Data-driven decision making and the good use of evidence and analytics in making choices and decisions.

K4. How to design, implement and optimise analytical algorithms - as prototypes and at production - using:

  1. Statistical and mathematical models and methods.
  2. Advanced and predictive analytics, machine learning and artificial intelligence techniques, simulations, optimisation, and automation.
  3. Applications such as computer vision and Natural Language Processing.
  4. An awareness of the computing and organisational resource constraints and trade-offs involved in selecting models, algorithms and tools.
  5. Development standards, including programming practice, testing, source control.

K5. The data landscape: how to critically analyse, interpret and evaluate complex information from diverse datasets:

  1. Sources of data including but not exclusive to files, operational systems, databases, web services, open data, government data, news and social media.
  2. Data formats, structures and data delivery methods including "structured" data.
  3. Common patterns in real-world data.

S1. Identify and clarify problems an organisation faces, and reformulate them into Data Science problems. Devise solutions and make decisions in context by seeking feedback from stakeholders. Apply scientific methods through experiment design, measurement, hypothesis testing and delivery of results. Collaborate with colleagues to gather requirements.

S2. Perform data engineering: create and handle datasets for analysis. Use tools and techniques to source, access, explore, prole, pipeline, combine, transform and store data, and apply governance (quality control, security, privacy) to data.

S3. Identify and use an appropriate range of programming languages and tools for data manipulation, analysis, visualisation, and system integration. Select appropriate data structures and algorithms for the problem. Develop reproducible analysis and robust code, working in accordance with software development standards, including security, accessibility, code quality and version control.

S4. Use analysis and models to inform and improve organisational outcomes, building models and validating results with statistical testing: perform statistical analysis, correlation vs causation, feature selection and engineering, machine learning, optimisation, and simulations, using the appropriate techniques for the problem.

S5. Implement data solutions, using relevant software engineering architectures and design patterns. Evaluate Cloud vs. on-premise deployment. Determine the implicit and explicit value of data. Assess value for money and Return on Investment. Scale a system up/out. Evaluate emerging trends and new approaches. Compare the pros and cons of software applications and techniques.

S6. Find, present, communicate and disseminate outputs effectively and with high impact through creative storytelling, tailoring the message for the audience. Use the best medium for each audience, such as technical writing, reporting and dashboards.

Conveners

View in Curriculum Catalogue
Last updated 07/01/2025.