What does ETL mean?
ETL stands for Extract, Transform, and Load. This describes the process by which we bring your data into your data warehouse and prep it for populating the Schoolzilla dashboards.
What does the ETL process look like?
First, we connect to your data source and copy data into your data warehouse. For student information systems, that's often done through a direct database connection (ODBC) or an application programming interface (API). For assessments, this is commonly done through a file upload to My Data, or a file drop from the vendor.
After loading the data into our "raw" schema, we transform the data with our connector code into our "staged" format. This serves to standardize your data across sources. For example, our Illuminate Data and Assessment connector code transforms Illuminate Benchmark data into the "staged" tables so it can sit alongside your state test score data.
"Staged" data is a consolidated, clean and organized copy of raw data. During staging Schoolzilla's data team:
- Consolidates data from multiple source systems
- Changes data types
- Conventionalizes field names
- Performs simple calculations
- Applies a number of gate keeping rules to identify, drop and document invalid rows of data.
Lastly, we convert the "staged" data into our star schema in the "dw" tables, comprising dimension and fact tables.
What is a star schema?
In data warehousing, a star schema is a dimensional model in which data are organized into facts and dimensions. A star schema is diagrammed by surrounding each fact with its associated dimensions resembling a star.
Why does Schoolzilla structure data this way?
Schoolzilla chose to structure data blocks in this fashion because it is optimal for querying large data sets. A star schema supports rapid aggregations of the measures stored in the fact table and can be easily filtered and grouped by the dimension tables. For example, the data relationships in the Test Scores star schema (one piece of the Achievement Data Block) depicted above allows you to easily average test scores for a specific test by school or grade level and filter for a particular school year.
This data structure also works well with Tableau and other data visualization tools!
What are dimension tables (dims)?
Dimension tables store attributes or reference information about a measurable event. Every row of data in a dimension table includes a unique key to link the data in that dimension table to a fact table.
Examples:
-
dimSchool: Up-to-date data about schools in your system. Data includes: contact information, school level (elementary, middle, etc.), and school rating.
-
dimStudent: Up-to-date data about your students. Data includes: addresses, demographic data, school enrollment data, and grade level. (Note: dimStudent in the student achievement data block only includes students for which we have testing data.)
What are fact tables (facts)?
A fact table stores the measures (or facts) of a given event. Events can be assessments, daily or class attendance, behavior, an observation, etc. Measures are the quantitative or factual data associated with the event. Measures are numeric and answer how much or how many. For example, a test score is the measurement associated with an assessment event. Some events, such as behavior events, will only have the "Number of Records" as a measurement. In these cases the number of records measurement indicates how many events occurred (e.g. the count of behavior incidents).
Fact tables also include unique keys in each row to tie the measurement of an event to the dimensions associated with that event. After joining to dimension tables, data in a fact table can be filtered or grouped by various combinations of attributes in the dimension tables.
Other dims and facts explained:
- dimSchoolCalendar: This is a large calendar template in table form that use as a reference for school-specific dates such as first and last days of school, test dates, etc.
- dimStudentPersonalized and dimSchoolPersonalized: The dimStudentPersonalized and dimSchoolPersonalized tables are available to personalize reports so teachers/school leaders see only data for students enrolled in their courses/school.