Statistical Models and Methods

Code	School	Level	Credits	Semesters
DATA2001	Computer Science	2	N/A	Full Year UK

Code: DATA2001
School: Computer Science
Level: 2
Credits: N/A
Semesters: Full Year UK

Summary

This teaching block provides an introduction to statistical concepts and methods to provide an appreciation of the scope of the subject. The first part of this block introduces a wide class of techniques such as regression, analysis of variance, analysis of covariance and experimental design which are used in a variety of data science applications. The second part of this block demonstrates the central role of parametric statistical models. The key concepts of inference including estimation and hypothesis testing will be described, as well as confidence intervals and likelihood ratio tests. Practical experience will be obtained by the use of a statistical computer package.

Target Students

Only available to those studying towards the Data Scientist Degree apprenticeship programme.

Classes

15-week teaching block 3 hours per week of e-learning online content (13 in total -39 hours) 2 hours per week of e-learning online content (1 in total – 2 hours) 1-hour online workshop session (1 in total - 1 hour) 6 hours block release day 3 hours block release day Total hours from the delivery plan 51 hours

Assessment

100% Assessment: Completion of the teaching block

Assessed by end of designated period

Educational Aims

The purpose of this course is to introduce a wide range of statistical concepts and methods fundamental to applications of statistics in data science. It will also introduce the key concepts and theory of linear models, illustrating their application via practical examples drawn from real-life situations. Apprentices will acquire knowledge and skills of relevance to a professional statistician.

Learning Outcomes

Apply methods concerning estimation of parameters in standard statistical models, in particular the method of moments and the maximum likelihood method.
Apply methods for interval estimation, in particular exact and approximate confidence intervals based on asymptotic theory.
Perform statistical hypotheses tests using data from studies (such as t and F-tests, comparison of models and parameter values).
Apply methods for analysing categorical data and methods without having to make distributional assumptions (non-parametric statistics).
Fit a linear model to data using statistical software.
Check model fit, diagnose errors, and perform model selection amongst the class of linear models.

KSBs

K3. How data can be used systematically, through an awareness of key platforms for data and analysis in an organisation, including:

Data processing and storage, including on-premise and cloud technologies.
Database systems including relational, data warehousing & online analytical processing, “NoSQL” and real-time approaches; the pros and cons of each approach.
Data-driven decision making and the good use of evidence and analytics in making choices and decisions.

K4. How to design, implement and optimise analytical algorithms – as prototypes and at production scale – using:

Statistical and mathematical models and methods.
Advanced and predictive analytics, machine learning and artificial intelligence techniques, simulations, optimisation, and automation.
Applications such as computer vision and Natural Language Processing.
An awareness of the computing and organisational resource constraints and trade-offs involved in selecting models, algorithms and tools.
Development standards, including programming practice, testing, source control.

K5. The data landscape: how to critically analyse, interpret and evaluate complex information from diverse datasets:

Sources of data including but not exclusive to files, operational systems, databases, web services, open data, government data, news and social media.
Data formats, structures and data delivery methods including “unstructured” data.
Common patterns in real-world data.

S1. Identify and clarify problems an organisation faces and reformulate them into Data Science problems. Devise solutions and make decisions in context by seeking feedback from stakeholders. Apply scientific methods through experiment design, measurement, hypothesis testing and delivery of results. Collaborate with colleagues to gather requirements.

S2. Perform data engineering: create and handle datasets for analysis. Use tools and techniques to source, access, explore, prole, pipeline, combine, transform and store data, and apply governance (quality control, security, privacy) to data.

S3. Identify and use an appropriate range of programming languages and tools for data manipulation, analysis, visualisation, and system integration. Select appropriate data structures and algorithms for the problem. Develop reproducible analysis and robust code, working in accordance with software development standards, including security, accessibility, code quality and version control.

S4. Use analysis and models to inform and improve organisational outcomes, building models and validating results with statistical testing: perform statistical analysis, correlation vs causation, feature selection and engineering, machine learning, optimisation, and simulations, using the appropriate techniques for the problem.

S5. Implement data solutions, using relevant software engineering architectures and design patterns. Evaluate Cloud vs. on-premise deployment. Determine the implicit and explicit value of data. Assess value for money and Return on Investment. Scale a system up/out. Evaluate emerging trends and new approaches. Compare the pros and cons of software applications and techniques.

B3. Adaptability and dynamism when responding to varied tasks and organisational timescales, and pragmatism in the face of real-world scenarios.

B5. An impartial, scientific, hypothesis-driven approach to work, rigorous data analysis methods, and integrity in presenting data and conclusions in a truthful and appropriate manner.

Conveners

Dr Lisa Mott

View in Curriculum Catalogue

Last updated 07/01/2025.