I’m a data analyst, statistician, engineer, and programmer with over 20 years experience. I have expertise in the whole field of data analytics from building ETL pipelines through to statistical modelling and forecasting. I’ve used most data analytics platforms, old and new including Databricks, Snowflake, SAS, R, Python and most SQL databases.
• ETL and data engineering using SAS, Snowflake, Databricks and SQL databases. • Analysis and Reporting with SAS, Python, R, PowerBI, Excel and many other tools.
• Programming in many languages, including SQL, SAS, Python, and R.
• Designing charts and other visualisations, including maps. • Reporting with many tools including R, SAS, JavaScript (D3.js), Python, Tableau, Qlik and Excel.
• Exploratory data analysis, time-series analysis and forecasting. • Classification, clustering and segmentation, and regression.
Most recently I worked on the implementation of a project to replace a data warehouse with Snowflake at the Ministry of Social Development. I worked on the ETL pipeline, combining raw data from the legacy warehouse with streams from live systems, as well as the refined layer of the system: deduplicating and cleaning data and merging data from multiple source systems. Before that I worked for ACC to help identify providers moving to a new online system, integrate their details from a number of different existing systems and provide assistance cleaning and uploading the data. Previously I worked at the Ministry of Social Development on three seperate contracts. In 2021 I refactored a set of critical reports to eliminate dependencies, speed up processing, and allow the reports to be more tolerant of system failures. In 2020 I worked on the data warehouse teams' code deployment system, upgrading it to allow to run on newer versions of SAS, and adding additional safeguards to the deployment process. In 2018 I was engaged to refactor code running on a grid server to improve performance and introduced incremental data updates. At Statistics New Zealand I conducted a technical review of regional GDP SAS system code; providing documentation of issues, recommendations for improvements and best practice, as well as proof of concept code. At the Ministry of Health I conducted a technical review of the use of SAS within an analytical team and delivered a report of finding and recommendations to management and technical staff and assisted in updating legacy code.
I was engaged for two contracts at the Ministry of Education and at the Ministry of Business, Innovation & Employment. At the Ministry of Education I was engaged for SAS data reporting and Analytics. The engagement for MBIE was to create software to extract and compile pricing data from electricity distributor websites using R.
At DHBNZ I worked on health workforce modelling, using logistic regression models to model workforce attrition and combined demographic and morbidity data to predict demand. Later I worked on monitoring hospital quality and productivity, using financial data in an activity based costing framework, and morbidity standardised quality indicators.
I worked on many contracts with Sysware, including contracts at the Ministries of Social Development, and Education, ACC, the IRD, and Child Youth & Family Services and the Tertiary Education Commission.
At the Ministry of Health my main roles were inter-district flow (output) calculation (as part of the population based funding model), as well as hospital throughput estimation, and health service costing and pricing. I introduced entirely new national data collections for non-admitted patient data and a new price collection framework.
My second role with Statistics New Zealand was as supervisor for the index development team in Wellington, where my main role was overseeing the re-basing of the Consumer Price Index, as well as developing other price indexes. I managed a small team of statisticians.
I worked on tax and survey data integration projects including, critically, the incorporation of GST data in short-term economic indicators. I also worked on re-development of surveys, including the retail and wholesale trade surveys.
I majored in Computer Science and also studied Mathematics, Information Systems, Cognitive Science, and Philosophy.
I majored in Economics and also studied Mathematics, Statistics, Operations Research, Business, Accounting, and Law.
As part of the migration of the Ministry of Social Development's data warehouse historic records data from thousands of tables in SAS based datasets needed to be combined with data from the new extraction process, and other combined in new Snowflake tables to present a seamless view to analysts. I wrote code to dynamically generate the necessary DDL using the information schemas of both sources as well as tables of mappings used in the old warehouse. Each generated query dealt with column renaming, data type conversion, and ensuring a contiguous history of each source record. The code generation process was then automated using Linux shell scripts so that legacy tables could be converted in batches.
The critical reports project at the Ministry of Social Development aimed to migrate a several operational reports out of the data warehouse, with the aim of improving their stability and timeliness. The project used a test driven development approach, starting with a battery of tests to ensure the reports reproduced previous results. This was followed by an analysis of the existing data sources and SAS code, and finally a complete re-write of the reports in SQL so that the code could be moved to the source systems.
To expand the range of quality indicators I worked on utilising new external causation coding in national inpatient data to produce new hospital quality indicators; in-hospital falls, and volume depletion (shock, blood loss and dehydration). We later employed the same methods to address some shortcomings in long-standing quality indicators including pressure injuries and urinary tract infections.
I developed statistical models to forecast supply and demand for the Nursing workforce. This included developing logistic regression models to predict for workforce attrition rates for different nursing specialties, and demographic projections of demand for nursing labour.
I built SAS and SQL queries to extract data from an IRD (tax) data warehouse and combine that with data from MSD systems. This project required streamlining the extraction of large volumes of data from the Oracle data warehouse and efficiently merging it with data from the Ministry of Social Development.
I assisted with assembling data for a microsimulation model; integrating data from social surveys with data from Ministry systems. Data integration involved extracting data from the MSD data warehouse, in SAS, and integrating that with social surveys (including the household economic survey, and the longitudinal Survey of Family, Income and Employment (SoFIE).
My role in outpatient service pricing began with analysis of existing cost data collections using analysis of variance and main effects analysis. As it became clear that costing data based on budget estimates from health boards was inconsistent I investigated and promoted the use of hospital activity-based costing systems as a source of data. This would eventually lead to a new national data collection.
I worked in Tonga for three weeks on secondment to NZAID and the IMF to advise and assist with the rebase of their CPI.
I was initially employed by the Ministry of Health to collect data to estimate the volumes of inter-district patients. Since there was no extant national collection, this involved negotiation (in person) with IT and decision support staff in every DHB in New Zealand, and establishing common parameters for data collection.
My main role in index development was overseeing the re-weighting of the consumer price index. This role involved supervising a small team charged with the collection and amalgamation of the necessary data, constructing and testing the new index structure, publishing and documentation.
I assisted with the redevelopment of a number of business surveys. A key part of this redevelopment was the integration of tax data into each business survey.
The business activity indicators project involved using GST data to produce new indicators of economic activity. A large part of the work involved improving existing code to deal with the large (at the time) volume of data, and automating the data cleaning process to avoid manually dealing with data problems.