LearnMood/Model Algoritma/DataSynthetic.ipynb

573 lines
16 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "c60c52da-31c9-46ff-ab3b-6bc851886f32",
"metadata": {},
"source": [
"## Import Library & Inisialisasi\n",
"\n",
"Tahap awal dilakukan dengan menyiapkan jumlah data dan memastikan proses random bersifat konsisten menggunakan seed"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "8bd2bcdb-740c-4e36-8771-b6627052ba6a",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"np.random.seed(42)\n",
"\n",
"n_samples = 2000\n",
"data = []"
]
},
{
"cell_type": "markdown",
"id": "60f026f9-5ae5-4157-ad15-64df947842d6",
"metadata": {},
"source": [
"###### pandas → manipulasi data\n",
"###### numpy → random data\n",
"###### seed(42) → hasil random konsisten (reproducible\n",
"\n",
"Membuat 2000 data sintetis\n",
"Disimpan dalam list sebelum jadi DataFrame"
]
},
{
"cell_type": "markdown",
"id": "3a7788f3-c270-48a2-9442-41244e3864b7",
"metadata": {},
"source": [
"### Generate Durasi Tidur, Mood Berdasarkan Tidur, Durasi Belajar Berdasarkan Mood\n",
"\n",
"Durasi tidur dihasilkan menggunakan distribusi normal dengan nilai rata-rata 6.5 jam, kemudian dibatasi agar tetap berada dalam rentang realistis (39 jam).\n",
"\n",
"Nilai mood ditentukan berdasarkan durasi tidur menggunakan pendekatan probabilitas, sehingga mencerminkan hubungan antara kondisi fisik dan emosional.\n",
"\n",
"Durasi belajar ditentukan berdasarkan kondisi mood, dengan asumsi bahwa kondisi emosional mempengaruhi tingkat fokus belajar."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "368ae584-e1cc-4b69-ac92-0c14cafab3e5",
"metadata": {},
"outputs": [],
"source": [
"for i in range(1, n_samples + 1):\n",
" \n",
" sleep = np.clip(np.random.normal(6.5, 1.5), 3, 9)\n",
" \n",
" if sleep >= 7:\n",
" mood = np.random.choice(\n",
" ['Bagus', 'Lumayan', 'Biasa Saja'],\n",
" p=[0.5, 0.3, 0.2]\n",
" )\n",
" elif sleep >= 5:\n",
" mood = np.random.choice(\n",
" ['Lumayan', 'Biasa Saja', 'Cukup Jenuh'],\n",
" p=[0.4, 0.4, 0.2]\n",
" )\n",
" else:\n",
" mood = np.random.choice(\n",
" ['Biasa Saja', 'Cukup Jenuh', 'Jenuh'],\n",
" p=[0.3, 0.4, 0.3]\n",
" )\n",
"\n",
" if mood == 'Bagus':\n",
" duration = np.random.normal(95, 20)\n",
" elif mood == 'Lumayan':\n",
" duration = np.random.normal(75, 20)\n",
" elif mood == 'Biasa Saja':\n",
" duration = np.random.normal(55, 15)\n",
" elif mood == 'Cukup Jenuh':\n",
" duration = np.random.normal(35, 10)\n",
" else: # Jenuh\n",
" duration = np.random.normal(25, 10)\n",
" "
]
},
{
"cell_type": "markdown",
"id": "396c21e7-aab8-4af0-915f-e595cadac737",
"metadata": {},
"source": [
"Durasi tidur dibangkitkan menggunakan distribusi normal dengan:\n",
"<p> * rata-rata tidur <bold>6.5 jam\n",
"<p> * standar deviasi 1.5 jam\n",
"\n",
"Mood ditentukan berdasarkan durasi tidur menggunakan probabilitas berbobot (weighted probability).\n",
"<p> * Tidur lebih cukup → peluang mood baik lebih besar.\n",
"<p> * Tidur kurang → peluang mood jenuh meningkat.\n",
"\n",
"Durasi belajar dibentuk berdasarkan kondisi mood pengguna\n",
"<p>* Mood baik → durasi belajar cenderung lebih tinggi.\n",
"<p>* Mood jenuh → durasi belajar cenderung lebih rendah.\n",
"\n",
"Tahap ini dilakukan agar hubungan antar variabel lebih mendekati kondisi nyata dan tidak dibentuk secara acak penuh"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "f1e66dd5-8ed5-4a62-bd9e-9353769dd643",
"metadata": {},
"outputs": [],
"source": [
"duration = int(np.clip(duration, 10, 180))"
]
},
{
"cell_type": "markdown",
"id": "efb60353-bf12-4c71-b3ec-43d4a6933c0e",
"metadata": {},
"source": [
"#### Durasi belajar dibatasi antara minimum 10 menit maksimum 180 menit.\n",
"\n",
"Hal ini dilakukan agar tidak muncul nilai yang tidak realistis"
]
},
{
"cell_type": "markdown",
"id": "1c793019-5f88-4a70-9464-462cc3a3155c",
"metadata": {},
"source": [
"## Penyusunan Dataset\n",
"\n",
"Data yang dihasilkan disusun ke dalam bentuk DataFrame dan disimpan sebagai file CSV"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "0b2ea9bb-fda1-45a8-ab80-27ebb48072ed",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>no</th>\n",
" <th>mood</th>\n",
" <th>durasi_belajar</th>\n",
" <th>durasi_tidur</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2000</td>\n",
" <td>Bagus</td>\n",
" <td>95</td>\n",
" <td>7.7</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" no mood durasi_belajar durasi_tidur\n",
"0 2000 Bagus 95 7.7"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.append([\n",
" i,\n",
" mood,\n",
" duration,\n",
" round(sleep, 1)\n",
" ])\n",
"# Buat DataFrame\n",
"df = pd.DataFrame(data, columns=[\n",
" 'no',\n",
" 'mood',\n",
" 'durasi_belajar',\n",
" 'durasi_tidur'\n",
"])\n",
"\n",
"# Simpan ke CSV\n",
"df.to_csv('data_sintetis_2000_tanpa_label.csv', index=False)\n",
"\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"id": "aa08129e-b6ea-40c6-8546-04f6f0730b01",
"metadata": {},
"source": [
"## Pembentukan Label (Pseudo-Labeling)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "2863896b-4c95-4aae-a963-d064aa16402a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>no</th>\n",
" <th>mood</th>\n",
" <th>durasi_belajar</th>\n",
" <th>durasi_tidur</th>\n",
" <th>mood_score</th>\n",
" <th>sleep_score</th>\n",
" <th>duration_score</th>\n",
" <th>total_score</th>\n",
" <th>label</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>Lumayan</td>\n",
" <td>72</td>\n",
" <td>7.2</td>\n",
" <td>1.5</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>5.5</td>\n",
" <td>Intensif</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2</td>\n",
" <td>Biasa Saja</td>\n",
" <td>59</td>\n",
" <td>4.8</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>2.0</td>\n",
" <td>Ringan</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>3</td>\n",
" <td>Bagus</td>\n",
" <td>110</td>\n",
" <td>8.9</td>\n",
" <td>2.0</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>6.0</td>\n",
" <td>Intensif</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4</td>\n",
" <td>Lumayan</td>\n",
" <td>64</td>\n",
" <td>5.6</td>\n",
" <td>1.5</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>4.5</td>\n",
" <td>Sedang</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>5</td>\n",
" <td>Biasa Saja</td>\n",
" <td>26</td>\n",
" <td>6.9</td>\n",
" <td>1.0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>2.0</td>\n",
" <td>Ringan</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" no mood durasi_belajar durasi_tidur mood_score sleep_score \\\n",
"0 1 Lumayan 72 7.2 1.5 2 \n",
"1 2 Biasa Saja 59 4.8 1.0 0 \n",
"2 3 Bagus 110 8.9 2.0 2 \n",
"3 4 Lumayan 64 5.6 1.5 1 \n",
"4 5 Biasa Saja 26 6.9 1.0 1 \n",
"\n",
" duration_score total_score label \n",
"0 2 5.5 Intensif \n",
"1 1 2.0 Ringan \n",
"2 2 6.0 Intensif \n",
"3 2 4.5 Sedang \n",
"4 0 2.0 Ringan "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_labeled = df.copy()\n",
"\n",
"# 1⃣ Mapping skor mood\n",
"mood_scores = {\n",
" 'Bagus': 2,\n",
" 'Lumayan': 1.5,\n",
" 'Biasa Saja': 1,\n",
" 'Cukup Jenuh': 0.5,\n",
" 'Jenuh': 0\n",
"}\n",
"\n",
"df_labeled['mood_score'] = df_labeled['mood'].map(mood_scores)\n",
"\n",
"# 2⃣ Skor durasi tidur\n",
"def sleep_score(hours):\n",
" if hours > 7:\n",
" return 2\n",
" elif hours >= 5:\n",
" return 1\n",
" else:\n",
" return 0\n",
"\n",
"df_labeled['sleep_score'] = df_labeled['durasi_tidur'].apply(sleep_score)\n",
"\n",
"# 3⃣ Skor durasi belajar\n",
"def duration_score(minutes):\n",
" if minutes > 60:\n",
" return 2\n",
" elif minutes > 30:\n",
" return 1\n",
" else:\n",
" return 0\n",
"\n",
"df_labeled['duration_score'] = df_labeled['durasi_belajar'].apply(duration_score)\n",
"\n",
"# 4⃣ Total skor\n",
"df_labeled['total_score'] = (\n",
" df_labeled['mood_score'] +\n",
" df_labeled['sleep_score'] +\n",
" df_labeled['duration_score']\n",
")\n",
"\n",
"# 5⃣ Buat label pseudo\n",
"def categorize(score):\n",
" if score <= 3:\n",
" return 'Ringan'\n",
" elif score <= 4.5:\n",
" return 'Sedang'\n",
" else:\n",
" return 'Intensif'\n",
"\n",
"df_labeled['label'] = df_labeled['total_score'].apply(categorize)\n",
"\n",
"df_labeled.head()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "7f81bc75-ac2a-403d-937f-f091e384a858",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"label\n",
"Ringan 780\n",
"Sedang 675\n",
"Intensif 545\n",
"Name: count, dtype: int64"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_labeled['label'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "232c5b2d-532e-43cf-8808-4abcc89ab7c0",
"metadata": {},
"outputs": [],
"source": [
"df_labeled.to_csv('data_sintetis_2000_dengan_pseudo_label.csv', index=False)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "4da2a7fa-38fe-485d-9738-e54b75b1fc3a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>mood</th>\n",
" <th>durasi_belajar</th>\n",
" <th>durasi_tidur</th>\n",
" <th>label</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Lumayan</td>\n",
" <td>72</td>\n",
" <td>7.2</td>\n",
" <td>Intensif</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Biasa Saja</td>\n",
" <td>59</td>\n",
" <td>4.8</td>\n",
" <td>Ringan</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Bagus</td>\n",
" <td>110</td>\n",
" <td>8.9</td>\n",
" <td>Intensif</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Lumayan</td>\n",
" <td>64</td>\n",
" <td>5.6</td>\n",
" <td>Sedang</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Biasa Saja</td>\n",
" <td>26</td>\n",
" <td>6.9</td>\n",
" <td>Ringan</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" mood durasi_belajar durasi_tidur label\n",
"0 Lumayan 72 7.2 Intensif\n",
"1 Biasa Saja 59 4.8 Ringan\n",
"2 Bagus 110 8.9 Intensif\n",
"3 Lumayan 64 5.6 Sedang\n",
"4 Biasa Saja 26 6.9 Ringan"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Ambil hanya kolom yang diperlukan\n",
"df_final = df_labeled[['mood', 'durasi_belajar', 'durasi_tidur', 'label']]\n",
"\n",
"df_final.head()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "3f6058f0-ba14-4ec2-ab41-2bdd124203e6",
"metadata": {},
"outputs": [],
"source": [
"df_final.to_csv('data_final_2000_dengan_label.csv', index=False)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.12",
"language": "python",
"name": "python312"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}